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Preface 


Important though the general concepts and propositions may be with which 
the modern and industrious passion for axiomatizing and generalizing has 
presented us，in algebra perhaps more than anywhere else，nevertheless I am 
convinced that the special problems in all their complexity constitute the 
stock and core of mathematics, and that to master their difficulties requires 

on the whole the harder labor. 

Herman Weyl 


This book began about 20 years ago in the form of supplementary notes for my alge¬ 
bra classes. I wanted to discuss some concrete topics such as symmetry, linear 
groups, and quadratic number fields in more detail than the text provided, and to 
shift the emphasis in group theory from permutation groups to matrix groups* Lat¬ 
tices, another recurring theme, appeared spontaneously. My hope was that the con¬ 
crete material would interest the students and that it would make the abstractions 
more understandable, in short，that they could get ferther by learning both at the 
same time. This worked pretty well. It took me quite a while to decide what I 
wanted to put in，but I gradually handed out more notes and eventually began teach¬ 
ing from them without another text* This method produced a book which is，I think, 
somewhat different from existing ones. However, the problems I encountered while 
fitting the parts together caused me many headaches, so I can’t recommend starting 
this way. 

The main novel feature of the book is its increased emphasis on special topics. 
They tended to expand each time the sections were rewritten, because I noticed over 
the years that, with concrete mathematics in contrast to abstract concepts, students 
often prefer more to less. As a result, the ones mentioned above have become major 
parts of the book* There are also several unusual short subjects, such as the Todd— 
Coxeter algorithm and the simplicity of PSL 2 . 
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Preface 


In writing the book，I tried to follow these principles: 

1. The main examples should precede the abstract definitions. 

2. The book is not intended for a “service course,” so technical points should be 
presented only if they are needed in the book, 

3. All topics discussed should be important for the average mathematician. 

Though these principles may sound like motherhood and the flag, I found it useful to 
have them enunciated, and to keep in mind that “Do it the way you were taught” 
isn’t one of them. They are, of course，violated here and there. 

The table of contents gives a good idea of the subject matter, except that a first 
glance may lead you to believe that the book contains all of the standard material in 
a beginning algebra course, and more. Looking more closely, you will find that 
things have been pared down here and there to make space for the special topics. I 
used the above principles as a guide. Thus having the main examples in hand before 
proceeding to the abstract material allowed some abstractions to be treated more 
concisely. I was also able to shorten a few discussions by deferring them until the 
students have already overcome their inherent conceptual difficulties. The discussion 
of Peano’s axioms in Chapter 10, for example, has been cut to two pages. Though 
the treatment given there is very incomplete，my experience is that it suffices to give 
the students the flavor of the axiomatic development of integer arithmetic. A more 
extensive discussion would be required if it were placed earlier in the book, and the 
time required for this wouldn’t be well spent. Sometimes the exercise of deferring 
material showed that it could be deferred forever — that it was not essential. This 
happened with dual spaces and multilinear algebra，for example, which wound up on 
the floor as a consequence of the second principle. With a few concepts, such as the 
minimal polynomial, I ended up believing that their main purpose in introductory al¬ 
gebra books has been to provide a convenient source of exercises* 

The chapters are organized following the order in which I usually teach a 
course, with linear algebra, group theory，and geometry making up the first 
semester. Rings are first introduced in Chapter 10, though that chapter is logically 
independent of many earlier ones. I use this unusual arrangement because I want to 
emphasize the connections of algebra with geometry at the start, and because, over¬ 
all, the material in the first chapters is the most important for people in other fields. 
The drawback is that arithmetic is given short shrift. This is made up for in the later 
chapters, which have a strong arithmetic slant. Geometry is brought back from time 
to time in these later chapters, in the guise of lattices, symmetry, and algebraic ge¬ 
ometry* 


Michael Artin 
December 1990 



A Note for the Teacher 


There are few prerequisites for this book. Students should be familiar with calculus, 
the basic properties of the complex numbers，and mathematical induction. Some ac¬ 
quaintance with proofs is obviously useful, though less essential. The concepts from 
topology，which are used in Chapter 8, should not be regarded as prerequisites. An 
appendix is provided as a reference for some of these concepts; it is too brief to be 
suitable as a text. 

Don’t try to cover the book in a one-year course unless your students have al¬ 
ready had a semester of algebra, linear algebra for instance，and are mathematically 
feirly mature. About a third of the material can be omitted without sacrificing much 
of the book’s flavor，and more can be left out if necessary. The following sections, 
for example，would make a coherent course: 

Chapter 1， Chapter 2, Chapter 3: 1-4, Chapter 4, Chapter 5: 1-7 ， 

Chapter 6: 1,2, Chapter 7: 1-6, Chapter 8: 1-3,5, Chapter 10: 1-7 ， 

Chapter 11: 1 一 8, Chapter 12: 1—7, Chapter 13: 1-6. 

This selection includes some of the interesting special topics: symmetry of plane 
figures，the geometry of Slh ，and the arithmetic of imaginary quadratic number 
fields. If you don’t want to discuss such topics，then this is not the book for you. 

It would be easy to spend an entire semester on the first four chapters, but this 
would defeat the purpose of the book. Since the real fun starts with Chapter 5, it is 
important to move along. If you plan to follow the chapters in order, try to get to 
that chapter as soon as is practicable, so that it can be done at a leisurely pace. It will 
help to keep attention focussed on the concrete examples. This is especially impor- 
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tant in the beginning for the students who come to the course without a clear idea of 
what constitutes a proof. 

Chapter 1， matrix operations, isn’t as exciting as some of the later ones, so it 
should be covered fairly quickly, I begin with it because I want to emphasize the 
general linear group at the start，instead of following the more customary practice of 
basing examples on the symmetric group. The reason for this decision is Principle 3 
of the preface: The general linear group is more important* 

Here are some suggestions for Chapter 2; 


1. Treat the abstract material with a light touch. You can have another go at it in 
Chapters 5 and 6. 

2. For examples, concentrate on matrix groups. Mention permutation groups only in 
passing. Because of their inherent notational difficulties, examples from symme¬ 
try such as the dihedral groups are best deferred to Chapter 5. 

3* Don’t spend too much time on arithmetic. Its natural place in this book is Chap¬ 
ters 10 and 11. 

4. Deemphasize the quotient group construction. 

Quotient groups present a pedagogical problem. While their construction is concep¬ 
tually difficult, the quotient is readily presented as the image of a homomorphism in 
most elementary examples, and so it does not require an abstract definition. Modular 
arithmetic is about the only convincing example for which this is not the case. And 
since the integers modulo n form a ring，modular arithmetic isn’t the ideal motivat¬ 
ing example for quotients of groups. The first serious use of quotient groups comes 
when generators and relations are discussed in Chapter 6, and I deferred the treat¬ 
ment of quotients to that point in early drafts of the book. But fearing the outrage of 
the algebra community I ended up moving it to Chapter 2. Anyhow, if you don’t 
plan to discuss generators and relations for groups in your course, then you can defer 
an in-depth treatment of quotients to Chapter 10， ring theory, where they play a 
central role，and where modular arithmetic becomes a prime motivating example. 

In Chapter 3, vector spaces ， I’ve tried to set up the computations with bases in 
such a way that the students won’t have trouble keeping the indices straight. I’ve 
probably failed，but since the notation is used throughout the book, it may be advis¬ 
able to adopt it. 

The applications of linear operators to rotations and linear differential equa¬ 
tions in Chapter 4 should be discussed because they are used later on, but the temp¬ 
tation to give differential equations their due has to be resisted. This heresy will be 
forgiven because you are teaching an algebra course. 

There is a gradual rise in the level of sophistication which is assumed of the 
reader throughout the first chapters, and a jump which I’ve been unable to eliminate 
occurs in Chapter 5. Had it not been for this jump, I would have moved symmetry 
closer to the beginning of the book. Keep in mind that symmetry is a difficult con¬ 
cept. It is easy to get carried away by the material and to leave the students behind. 
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Except for its first two sections，Chapter 6 contains optional material. The last 
section on the Todd-Coxeter algorithm isn’t standard; it is included to justify the 
discussion of generators and relations，which is pretty useless without it* 

There is nothing unusual in the chapter on bilinear forms. Chapter 7. I haven’t 
overcome the main problem with this material, that there are too many variations on 
the same theme, but have tried to keep the discussion short by concentrating on the 
real and complex cases. 

In the chapter on linear groups, Chapter 8, plan to spend time on the geometry 
of SU 2 . My students complained every year about this chapter until I expanded the 
sections on Slh ，after which they began asking for supplementary reading，wanting 
to learn more. Many of our students are not familiar with the concepts from topol¬ 
ogy when they take the course, and so these concepts require a light touch. But I’ve 
found that the problems caused by the students’ lack of femiliarity can be managed. 
Indeed，this is a good place for them to get an idea of what a manifold is. Unfortu¬ 
nately, I don't know a really satisfactory reference for further reading. 

Chapter 9 on group representations is optional, I resisted including this topic 
for a number of years, on the grounds that it is too hard. But students often request 
it, and I kept asking myself: If the chemists can teach it, why can’t we? Eventually 
the internal logic of the book won out and group representations went in. As a divi¬ 
dend, hermitian forms got an application. 

The unusual topic in Chapter 11 is the arithmetic of quadratic number fields. 
You may find the discussion too long for a general algebra course. With this possibil¬ 
ity in mind ， I’ve arranged the material so that the end of Section 8, ideal factoriza¬ 
tion, is a natural stopping point. 

It seems to me that one should at least mention the most important examples of 
fields in a beginning algebra course, so I put a discussion of function fields into 
Chapter 13. 

There is always the question of whether or not Galois theory should be pre¬ 
sented in an undergraduate course- It doesn’t have quite the universal applicability 
of most of the subjects in the book. But since Galois theory is a natural culmination 
of the discussion of symmetry, it belongs here as an optional topic* I usually spend at 
least some time on Chapter 14. 

I considered grading the exercises for difficulty，but found that I couldn’t do it 
consistently. So I’ve only gone so far as to mark some of the harder ones with an 
asterisk. I believe that there are enough challenging problems, but of course one al¬ 
ways needs more of the interesting, easier ones. 

Though I’ve taught algebra for many years, several aspects of this book are ex¬ 
perimental, and I would be very grateful for critical comments and suggestions from 
the people who use it. 

“One ， two, three ， five ， four …” 

“No Daddy，if s one, two, three ， four, five •” 
u Well if I want to say one, two, three ， five ， four, 
why can't I? ft 
“That’s not how it goes •” 
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Matrices play a central role in this book. They form an important part of the theory, 
and many concrete examples are based on them. Therefore it is essential to develop 
facility in matrix manipulation. Since matrices pervade much of mathematics, the 
techniques needed here are sure to be useful elsewhere. 

The concepts which require practice to handle are matrix multiplication and 
determinants. 


L THE BASIC OPERATIONS 

Let m, n be positive integers. An m x n matrix is a collection of mn numbers ar¬ 
ranged in a rectangular array: 

n columns 


( 1 . 1 ) 


flu •… d\n 

* * 

m rows * • 

_ ■ 

dm\ * dmn 


For example ， is a 2 x 3 matrix. 

The numbers in a matrix are called the matrix entries and are denoted by fly ， 
where i, j are indices (integers) with \ < i < m and 1 < j < n. The index i is 
called the row index ， and j is the column index. So is the entry which appears in 


1 



2 
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the ith row and yth column of the matrix: 




In the example above, flu = 2, = 0, and a 2 3 = 5. 

We usually introduce a symbol such as A to denote a matrix, or we may write it 

aS (fly). 

A 1 x n matrix is called an n-dimensional row vector • We will drop the index i 
when m = l and write a row vector as 


(1.2) A = or as A = ( 山 ， ,"，％)• 

The commas in this row vector are optional. Similarly, an m x 1 matrix is an m- 
dimensional column vector: 

(1.3) B = • 

鲁 

b m 

A 1 x 1 matrix [a] contains a single number, and we do not distinguish such a ma¬ 
trix from its entry. 



(L4) Addition of matrices is vector addition: 

(aij) + (bij) = (sy), 

where Sij = a ,； + bij for all j. Thus 

^2 1 0] [ 1 0 3] _ [3 1 3" 

」 3 5」 + L4 -3 lj = [5 0 6_ - 

The sum of two matrices A,B is defined only when they are both of the same 
shape，that is, when they are m x w matrices with the same m and n. 


(1,5) Scalar multiplication of a matrix by a number is defined as with vectors. The 
result of multiplying a number c and a matrix (aij) is another matrix: 

c(aij) = (bij), 

where bij = caij for all /, j. Thus 

'0 n ro 2 

2 2 3 = 4 6 

2 1 4 2 

Numbers will also be referred to as scalars • 
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The usefulness of this definition becomes apparent when we regard A and B as vec¬ 
tors which represent indexed quantities. For example, consider a candy bar contain¬ 
ing m ingredients. Let fl/ denote the number of grams of (ingredient)i per candy bar, 
and let hi denote the cost of {ingredient^ per gram* Then the matrix product AB = c 
computes the cost per candy bar: 

(grams/bar) ^ (cost/gram) = (cost/bar). 

On the other hand，the fact that we consider this to be the product of a row by a 
column is an arbitrary choice. 

In general，the product of two matrices A and B is defined if the number of 
columns of A is equal to the number of rows of B, say if A is an € x m matrix and B 
is an m x n matrix. In this case, the product is an € x n matrix. Symbolically ， 
(€ x m) - (mx n) = (€ x n). The entries of the product matrix are computed by 
multiplying all rows of A by all columns of B, using rule (1.6) above. Thus if we de¬ 
note the product AB by P, then 

(1,7) pij = anb\j + anbij + + a im b m j. 

This is the product of the /th row of A and the jth column of B . 


The complicated notion is that of matrix multiplication. The first case to learn 
is the product AS of a row vector A (1.2) and a column vector B (1.3) which is 
defined when both are the same size, that is ， m = n. Then the product AB is the 
lxl matrix or scalar 

( 1 - 6 ) (2\b\ d2^2 十 … + Clmbm . 

(This product is often called the “dot product” of the two vectors•) Thus 




- - 
11 11 4 

I_I 
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For example ， 

as) 



This definition of matrix multiplication has turned out to provide a very convenient 
computational tool. 

Going back to our candy bar example, suppose that there are € candy bars. 
Then we may form a matrix A whose /th row measures the ingredients of (bar)i. If 
the cost is to be computed each year for n years, we may form a matrix B whose jth 
column measures the cost of the ingredients in (year)j * The matrix product AB = P 
computes the cost per bar: py = cost of (bar)i in (year)j. 

Matrix notation was introduced in the nineteenth century to provide a short¬ 
hand way of writing linear equations. The system of equations 


anX\ + … + axnXn = b\ 
+ … + a 2n Xn = b 2 


dm\X\ + ••• + dmnXn ~ bm 

can be written in matrix notation as 


(1.9) AX = B ， 

where A denotes the coefficient matrix (cnj) ， X and B are column vectors, and AX is 
the matrix product 



Thus the matrix equation 



represents the following system of two equations in three unknowns: 


-x 2 -f 2 jc 3 = 2 
3jCi + 4x 2 — 6 x 3 = 1. 


Equation (1.8) exhibits one solution: jci = 1 ， X 2 = 4, JC 3 = 3. 

Formula (1.7) defining the product can also be written in “sigma” notation as 

m 

Pij = 2 ^ikbkj = S ^ikbkj* 

Jfc=i k 
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Each of these expressions is a shorthand notation for the sum (1,7) which defines the 
product matrix. 

Our two most important notations for handling sets of numbers are the S or 
sum notation as used above and matrix notation. The 2 notation is actually the more 
versatile of the two, but because matrices are much more compact we will use them 
whenever possible. One of our tasks in later chapters will be to translate complicated 
mathematical structures into matrix notation in order to be able to work with them 
conveniently. 

Various identities are satisfied by the matrix operations ， such as the distributive 

laws 

(L10) A(B + B’）= A5 + AB\ and (A + A f )B = AB + A f B 

and the associative law 

(Lll) (AB)C = A(BC). 

These laws hold whenever the matrices involved have suitable sizes, so that the 
products are defined. For the associative law，for example, the sizes should be 
A = £xm J B = mXn and ， C = nx p 9 for some €, m, n ， p. Since the two products 
(Lll) are equal，the parentheses are not required, and we will denote them by ABC. 
The triple product ABC is then an € x p matrix. For example, the two ways of com¬ 
puting the product 

2 0 " 

1 1 
0 1 


are 

( 、「101 

_ C= |_2 0 2 

Scalar multiplication is compatible with matrix multiplication in the obvious 
sense: 



ABC 


2 


[10 1 ] 


(L12) c(ab) = (ca)b = a(cB). 

The proofs of these identities are straightforward and not very interesting. 

In contrast, the commutative law does not hold for matrix multiplication; that 
is ， 


(1.13) AB ¥= BA ， usually. 

In fact, if A is an € x m matrix and is an w x € matrix，so that A5 and BA are both 
defined，then AB is € X € while fiA is m X m. Even if both matrices are square，say 
mx m, the two products tend to be different. For instance, 


0 1 
0 0 


0 0 
0 1 


0 1 
0 0 


， while 


0 0 
0 1 


0 1 
0 0 


0 0 
0 0 
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Since matrix multiplication is not commutative, care must be taken when 
working with matrix equations. We can multiply both sides of an equation B = C on 
the left by a matrix A, to conclude that AB — AC, provided that the products are 
defined. Similarly, if the products are defined, then we can conclude that BA = CA. 
We can not derive AB = CA from B = Cl 

Any matrix all of whose entries are 0 is called a zero matrix and is denoted by 
0, though its size is arbitrary. Maybe 0 mX/J would be better. 

The entries an of a matrix A are called its diagonal entries, and a matrix A 
is called a diagonal matrix if its only nonzero entries are diagonal entries. 

The square n x n matrix whose only nonzero entries are 1 in each diagonal po¬ 
sition, 


(1*14) 




= ： 


0 



01 



is called the n x n identity matrix. It behaves like 1 in multiplication: If A is an 
m 乂 n matrix, then 


I m A = A and Al n = A. 


Here are some shorthand ways of drawing the matrix /„: 

"1 0 

In = . 

_0 .1 

We often indicate that a whole region in a matrix consists of zeros by leaving it 
blank or by putting in a single 0. 

We will use * to indicate an arbitrary undetermined entry of a matrix • Thus 

厂 i 

氺 氺 

* 

■ 

0 * 

■ 

may denote a square matrix whose entries below the diagonal are 0, the other entries 
being undetermined. Such a matrix is called an upper triangular matrix. 

Let A be a (square) n x n matrix. If there is a matrix B such that 

(1.15) AB = In and BA = I n , 
then B is called an inverse of A and is denoted by A' 1 : 

(1.16) A^ l A = I n = AA~K 



When A has an inverse, it is said to be an invertible matrix. For example, the matrix 


A 


2 

5 


is invertible. Its inverse is A 


-i 


3-1 
—5 2 


， as is seen by computing 
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the products AA~ l and A~ l A. Two more examples are: 



We will see later that A is invertible if there is a matrix B such that either one of 
the two relations AB = I n or BA = I n holds, and that B is then the inverse [see 
(2.23)]. But since multiplication of matrices is not commutative，this feet is not obvi¬ 
ous. It fails for matrices which aren’t square. For example, let A = [1 2] and let 

11 f i 2 I 

B = o ■ Then AB = [1] = /i, but BA = Q Q / 2 . 

— i_ 一 

On the other hand, an inverse is unique if it exists at all. In other words，there 
can be only one inverse. Let B ， B f be two matrices satisfying (1.15 )， for the same 
matrix A. We need only know that AB = i n (b is 3 l right inverse) and that B f A = I n 
(B f is a left inverse). By the associative law ， B f {AB) - Thus 

(1.17) B f = B f l = B f (AB) = = IB = B, 

and so B f = B. u 


(1.18) Proposition. Let A,B be nx n matrices. If both are invertible, so is their 
product AB, and 

(AB)~ l = 


More generally，if Ai，_ 
inverse is A m ~ l ".AT 1 . 

Thus the inverse of ^ 


， A m are invertible, then so is the product A\ " ， A m ， and its 



Proof• Assume that A,B are invertible. Then we check that is the in¬ 

verse of AB ： 


and similarly 


ABB~ l A~ l = AM" 1 = AA~ l = /, 


The last assertion is proved by induction on m [see Appendix (2,3)]* When m = 1 ， 
the assertion is that if A x is invertible then A } ~ 1 is the inverse of A u which is trivial. 
Next we assume that the assertion is true form = k, and we proceed to check it for 
m = + L We suppose that Ai ， … ， Afc +1 are invertible nX n matrices, and we de¬ 

note by P the product Ai •• A* of the first k matrices. By the induction hypothesis, P 
is invertible, and its inverse is Af l … Ar 1 . Also, Ajt+i is invertible. So, by what has 
been shown for two invertible matrices，the product PA^i = Ai … AjtAfc+i is invert¬ 
ible, and its inverse is Ak^i^ l P~ v = ••• A 1 1 , This shows that the assertion is 

true for m = k which completes the induction proof. □ 
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Though this isn’t clear from the definition of matrix multiplication, we will see 
that most square matrices are invertible. But finding the inverse explicitly is not a 
simple problem when the matrix is large. 

The set of all invertible nX n matrices is called the n-dimensional general lin¬ 
ear group and is denoted by GL n . The general linear groups will be among our most 
important examples when we study the basic concept of a group in the next chapter. 

Various tricks simplify matrix multiplication in favorable cases. Block multipli¬ 
cation is one of them. Let M ， m' be w x n and nX p matrices，and let r be an integer 
less than n. We may decompose the two matrices into blocks as follows: 

M = [A \b] and M f = 

where A has r columns and A' has r rows. Then the matrix product can be computed 
as follows: 




(M9) 


UM f = AA f + BB\ 


This decomposition of the product follows directly from the definition of multiplica¬ 
tion, and it may facilitate computation. For example ， 



0 






Note that formula (1.19) looks the same as rule (1.6) for multiplying a row 
vector and a column vector. 

We may also multiply matrices divided into more blocks. For our purposes ， 汪 
decomposition into four blocks will be the most useful. In this case the rule for block 
multiplication is the same as for multiplication of 2 x 2 matrices. Let r + 5 = n and 
let k + £ = m. Suppose we decompose an m x n matrix M and an n x p matrix M f 
into submatrices 



A 

B 

C 

D 



A f 


C f 

D f 


where the number of columns of A is equal to the number of rows of A\ Then the 
rule for block multiplication is 


( 1 . 20 ) 

For example, 


■ 

A 

B_ 


^A r 



~AA f 

+ BC f 

AB 9 

+ BD’ 

C 

D- 


C f 

D’ _ 


_CA f 

+ DC ， 

CB f 

+ DD f 
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In this product，the upper left block is [1 


0 ] 


2 

4 


+ [5] [0 


1] = [2 8], etc. 


Again, this rule can be verified directly from the definition of matrix multipli¬ 
cation. In general，block multiplication can be used whenever two matrices are de¬ 
composed into submatrices in such a way that the necessary products are defined. 
Besides facilitating computations，block multiplication is a useful tool for prov¬ 


ing facts about matrices by induction. 


2. ROW REDUCTION 

Let A = (ay) be an w x /j matrix, and consider a variable nx p matrix X = (xij). 
Then the matrix equation 

(2.1) Y = AX 

defines the mX p matrix Y = as a function of X. This operation is called left 
multiplication by A: 

( 2 . 2 ) yij = anxij + … + a in x n j • 

Notice that in formula (2.2) the entry ytj depends only on jc " ，，， _ ， jc —， that is，on the 
yth column of X and on the ith row of the matrix A. Thus A operates separately on 
each column of X, and we can understand the way A operates by considering its ac¬ 
tion on column vectors: 


A 


X\ 


# 

X n 


4 

ym 


Left multiplication by A on column vectors can be thought of as a function 
from the space of n-dimensional column vectors X to the space of w-dimensional 
column vectors Y, or a collection of m functions of n variables: 


yt = anxx + + a in x n (i = 1 ， … ， w). 

It is called a linear transformation, because the functions are homogeneous and lin¬ 
ear, (A linear function of a set of variables is one of the form a\U\ + 

• •• + akUk + c y where are scalars. Such a function is homogeneous lin¬ 

ear if the constant term c is zero,) 


A picture of the operation of the 2 x 2 matrix 
2-space to 2-space: 


2 

4 


is shown below* It maps 
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Going back to the operation of A on an /i x /? matrix X y we can interpret the fact 
that A acts in the same way on each column of X as follows: Let K, denote the ith row 
of Y, which we view as a row vector: 

— Y, ― 

— y 2 — 

Y = • 

* 

■ 

— 4 — 

We can compute K, in terms of the rows Xj of X, in vector notation, as 
(2.4) Yi = Qi\X\ + … + ClinXn • 

This is just a restatement of (2.2), and it is another example of block multiplication* 
For example，the bottom row of the product 


■ ■ 

0-12 

"1 O' 


*2 2 

3 4 一 6 

4 2 

— 

_1 -4_ 


-3 2_ 



can be computed as 3[1 0] + 4[4 2] — 6[3 2] = [1 —4]. 

When A is a square matrix，we often speak of left multiplication by A as a row 
operation. 

The simplest nonzero matrices are the matrix units，which we denote by eij ： 


* 
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This matrix etj has a 1 in the (/, j) position as its only nonzero entry. (We usually 
denote matrices by capital letters, but the use of a small letter for the matrix units is 
traditional-) Matrix units are useful because every matrix A = (atj) can be written 
out as a sum in the following way: 


A = a n e u + ai 2 ^i 2 + 


^nn^nn 


5 啊 . 


The indices i,j under the sigma mean that the sum is to be taken over all values of i 
and all values of For instance 

= 3^11 + 2^12 + 1^21 + 4 ^ 22 - 



Such a sum is called a linear combination of the matrices 

The matrix units are convenient for the study of addition and scalar multiplica¬ 
tion of matrices. But to study matrix multiplication, some square matrices called ele¬ 
mentary matrices are more useful. There are three types of elementary matrix: 


(2-6i) 


■ ■ 

1 

• a 


■1 ' 

■ 

# 

or 

* 

• 


* 

« 

i 


a • 

1 

丄 

•. ■ 


1 

■ ■ 


=/ + aetj 


(*•#))- 


Such a matrix has diagonal entries 1 and one nonzero off-diagonal entry. 


(2.6ii) 




e 3J* 


Here the /th and jth diagonal entries of / are replaced by zero，and two Vs are 
added in the (i y j) and (j, i) positions. (The formula in terms of the matrix units is 
rather ugly，and we won’t use it much,) 


(2,6iii) 



c 


=/ + (C — 1) 匕 7 ， （C 尹 0)_ 


One diagonal entry of the identity matrix is replaced by a nonzero number c • 
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The elementary 2X2 matrices are 

c_ 

where, as above, a is arbitrary and c is an arbitrary nonzero number. 

The elementary matrices E operate on a matrix X as described below. 

(2.7) To get the matrix EX, you must: 

Type (i): Replace the /th row X, by Xi + aXj, or 

add a - (row j) to (row/); 

Type (ii): Interchange (row/) and (row j ); 

Type (iii): Multiply (row/) by a nonzero scalar c. 

These operations are called elementary row operations • Thus multiplication by an el¬ 
ementary matrix is an elementary row operation. You should verify these rules of 
multiplication carefully. 

(2.8) Lemma. Elementary matrices are invertible, and their inverses are also ele¬ 
mentary matrices. 

The proof of this lemma is just a calculation. The inverse of an elementary ma¬ 
trix is the matrix corresponding to the inverse row operation: If £ = / + aetj is of 
Type (i)，then E~ l = / — aetj ； “subtract a •(row)) from (row /)，，，If £ is of Type (ii )， 
then E~ l — E, and if E is of Type (iii)，then E ~ 1 is of the same type，with c* 1 in the 
position that c has in E\ “multiply (row/) by c 1 ’’ ，口 

We will now study the effect of elementary row operations (2,7) on a matrix A, 
with the aim of ending up with a simpler matrix A r : 

sequence of operations . 

A ― ^ —— > ——— 

Since each elementary row operation is obtained as the result of multiplication by an 
elementary matrix, we can express the result of a succession of such operations as 
multiplication by a sequence £ 1 ， of elementary matrices: 

(2.9) = E k … E 2 k 、 A. 

This procedure is called row reduction, or Gaussian elimination. For example, we 
can simplify the matrix 

"10 2 1 5 " 

(2.10) M= 115 2 7 

1 2 8 4 12_ 

by using the first type of elementary operation to clear out as many entries as 
possible: 
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Row reduction is a useful method of solving systems of linear equations t Sup¬ 
pose we are given a system of m equations in n unknowns, say AY = B as in (1,9 )， 
where A is m mx n matrix, X is an unknown column vector, and B is a given 
column vector. To solve this system, we form the w x (n + 1) block matrix 

an • • • am b\ 

(2.11) M = [a\b] = • - 

* * ■ 

Clml * dmn bn 

一 

and we perform row operations to simplify M. Note that EM = [EA \ EB]. Let 

M' = [A’|B’] 

be the result of a sequence of row operations. The key observation follows: 

(2.12) Proposition • The solutions of A f x = B f are the same as those of AY = 
Proof. Since M f is obtained by a sequence of elementary row operations, 

M r = 

Let P = E r … Ei This matrix is invertible, by Lemma (2,8) and Proposition (1.18). 
Also, M f = [A ; \B f ] = [pa\pb]. If X is a solution of the original system AX = B, 
then PAX = PB ， which is to say, A f x = 5'. So X also solves the new system. Con¬ 
versely, if A f X = then AX = P~ l A f X = P~ l B f = B ， so X solves the system 
AX = B too. □ 

For example, consider the system 

X\ + 2x3 + x 4 = 5 

(2.13) jci + X2 + 5x3 + 2x 4 = 1 

X\ + 2xi + 8 a ：3 + 4x4 = 12 . 

Its augmented matrix is the matrix M considered above (2.10)，so our row reduction 
of this matrix shows that this system of equations is equivalent to 

X\ + 2x3 — 2 

X 2 + 3^3 = 一 1 



o o 1 


-5 2 7 - 


2 3 0 


o 1 o 


IX IX OTW 1A o 


2 3 6 


0 12 


loo 


5 2 3 


5 2 2 

IX 


114 o 1 o 
2 3 8 1 0 0 


0 12 


IX 


8 


x 4 = 3, 
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We can read off the solutions of this system immediately: We may choose arbitrar¬ 
ily and then solve for jci , jc 】, and The general solution of (2,13) can therefore be 
written in the form 

X 3 = C 3 , JCi = 1 - 2C3, X2 = -1 - 3C3, X 4 = 3, 

where c 3 is arbitrary. 

We now go back to row reduction of an arbitrary matrix. It is not hard to see 
that, by a sequence of row operations，any matrix A can be reduced to one which 
looks roughly like this: 


(2,14) 


A 


1 0 0 

* 


0 


1 0 

水 

* * 

•氺 

0 


1 


•* 

0 

• * * 




1 

* 





, 

* 


where * denotes an arbitrary number and the large blank space consists of zeros. 
This is called a row echelon matrix. For instance, 

"i 6 o r 
0 0 12 
_0 0 0 0 _ 

is a row echelon matrix. So is the end result of our reduction of (2,10), The 
definition of a row echelon matrix is given in (2,15): 

(2.15) 

(a) The first nonzero entry in every row is 1, This entry is called a pivot. 

(b) The first nonzero entry of row i + 1 is to the right of the first nonzero en¬ 
try of row i. 

(c) The entries above a pivot are zero. 

To make a row reduction, find the first column which contains a nonzero en¬ 
try. (If there is none，then A = 0, and 0 is a row echelon matrix,) Interchange rows 
using an elementary operation of Type (ii)，moving a nonzero entry to the top row. 
Normalize this entry to 1 using an operation of Type (iii). Then clear out the other 
entries in its column by a sequence of operations of Type (i). The resulting matrix 
will have the block form 
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0 … 0 

1 

* 

■ a a 4 

t _ _ 





0 … 0 1 

0 

* 

T * B 1 

* 

» _ _ 

,which we may write as 

■ 

1 

B 

• * 

• • 

* * 

* 

* 

ft 

4 _ 

* * 

_ * 

■ 


D 

0 … 0 

0 

* 

* 

— 







_ 


We now continue，performing row operations on the smaller matrix D (cooking until 
done). Formally, this is induction on the size of the matrix. The principle of com¬ 
plete induction [see Appendix (2.6)] allows us to assume that every matrix with 
fewer rows than A can be reduced to row echelon form. Since D has fewer rows，we 
may assume that it can be reduced to a row echelon matrix, say D f \ The row opera¬ 
tions we perform to reduce D to D 〃 will not change the other blocks making up A 9 . 
Therefore A f can be reduced to the matrix 



which satisfies requirements (2.15a and b) for a row echelon matrix. Therefore our 
original matrix A can be reduced to this form. The entries in B above the pivots of D" 
can be cleared out at this time，to finish the reduction to row echelon form, □ 


It can be shown that the row echelon matrix obtained from a given matrix A by 
row reduction is unique, that is, that it does not depend on the particular sequence of 
operations used- However, this is not a very important point, so we omit the proof. 

The reason that row reduction is useful is that we can solve a system of equa¬ 
tions A f X = B r immediately if A' is in row echelon form. For example, suppose that 



1 

6 

0 

1 

o' 


0 

0 

1 

2 

0 


_0 

0 

0 

0 

1 


There is no solution to A f X = B f because the third equation is 0 = 1, On the other 
hand, 



'1 

6 

0 

1 

r 


0 

0 

1 

2 

3 


-0 

0 

0 

0 

0_ 


has solutions- Choosing jc 2j jc 4 arbitrarily, we can solve the first equation for a and 
the second for jc 3 . This is the procedure we use to solve system (2.13). 

The general rule is as follows: 

(2.16) Proposition. Let M f = [A f \B f ] be a row echelon matrix. Then the system 
of equations A f X = B f has a solution if and only if there is no pivot in the last 
column In that case, an arbitrary value can be assigned to the unknown Xi if 
column i does not contain a pivot• □ 
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Of course every homogeneous linear system AX = 0 has the trivial solution 
X = 0. But looking at the row echelon form again，we can conclude that if there are 
more unknowns than equations then the homogeneous equation AX = 0 has a non¬ 
trivial solution for X : 

(2.17) Corollary • Every system AX = 0 of m homogeneous equations in n un¬ 
knowns, with m < n，has a solution X in which some Xi is nonzero. 

For, let A f X = 0 be the associated row echelon equation, and let r be the number of 
pivots of Then r < m. According to the proposition，we may assign arbitrary 
values to n — r variables Xi. □ 

We will now use row reduction to characterize square invertible matrices. 

(2.18) Proposition. Let A be a square matrix. The following conditions are equiva¬ 
lent: 

(a) A can be reduced to the identity by a sequence of elementary row operations. 

(b) A is a product of elementary matrices. 

(c) A is invertible. 

(d) The system of homogeneous equations AX = 0 has only the trivial solution 
X = 0. 

Proof. We will prove this proposition by proving the implications (a)=^(b)=^ 
(c)=^(d)=>(a). To show that (a) implies (b), suppose that A can be reduced to the 
identity by row operations: Ek^^ExA — /. Multiplying both sides of this equation on 
the left by E「 l … Ef 1 ， we obtain A = fr 1 ••• Ef 、 Since the inverse of an elemen¬ 
tary matrix is elementary, this shows that A is a product of elementary matrices. Be¬ 
cause a product of elementary matrices is invertible ， (b) implies (c) * If A is invertible 
we can multiply both sides of the equation AX = 0 by A~ l to derive X = 0, So the 
equation AX = 0 has only the trivial solution* This shows that (c) implies (d)* 

To prove the last implication，that (d) implies (a), we take a look at square row 
echelon matrices M. We note the following dichotomy: 

(2.19) Let M be a square row echelon matrix. 

Either M is the identity matrix, or its bottom row is zero. 

This is easy to see，from (2.15). 

Suppose that (a) does not hold for a given matrix A. Then A can be reduced by 
row operations to a matrix A f whose bottom row is zero• In this case there are at 
most n—1 nontrivial equations in the linear system A f X = 0, and so Corollary (2.17) 
tells us that this system has a nontrivial solution. Since the equation AX = 0 
is equivalent to A f X = 0, it has a nontrivial solution as well. This shows that if (a) 
fails then (d) does too; hence (d) implies (a). This completes the proof of Proposition 
(2.18). □ 
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(2.20) Corollary. If a row of a square matrix A is zero, then A is not invertible. □ 

Row reduction provides a method of computing the inverse of an invertible 
matrix A: We reduce A to the identity by row operations: 

Ek * 9 m E\A = / 

as above. Multiplying both sides of this equation on the right by A' 1 , we have 

Ek " ， £i/ = A~ l . 


(2.21) Corollary• Let A be an invertible matrix. To compute its inverse A~\ apply 
elementary row operations ，…，£灸 to A，reducing it to the identity matrix. The 
same sequence of operations, when applied to /， yields A~\ 

The corollary is just a restatement of the two equations. □ 


(2.22) Example. We seek the inverse of the matrix 



4 



To compute it we form the 2x4 block matrix 



We perform row operations to reduce A to the identity, carrying the right side along, 
and thereby end up with A~ l on the right because of Corollary (2.21). 


[A|/] 



Subtract (row 1) from (row 2) 



4 



0 



0 




Subtract 4 * (row 2) from (row 1) 
Subtract (row 1) from (row 2) 

= [/|A_a 


Thus A^ 1 = 



(2.23) Proposition• Let A be a square matrix which has either a left inverse B: 
BA = /， or a right inverse: AB = I. Then A is invertible, and B is its inverse. 

Proof. Suppose that AB = I. We perform row reduction on A. According to 
(2*19)，there are elementary matrices E u ...,Ek so that A f = Ek-^E X A either is the 
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identity matrix or has bottom row zero. Then A f B — Ek … Ei ， which is an invertible 
matrix. Hence the bottom row of A f B is not zero, and it follows that A^ has a nonzero 
bottom row too. So A f = /. By (2.18) s A is invertible, and the equations 
I = Ek^.E\A and Afi = / show that A' 1 = Ek … = B (see (1.17)). The other case 
is that SA = /. Then we can interchange A and B in the above argument and con¬ 
clude that B is invertible and A is its inverse. So A is invertible too. □ 

For most of this discussion, we could have worked with columns rather than 
rows. We chose to work with rows in order to apply the results to systems of linear 
equations; otherwise columns would have served just as well. Rows and columns are 
interchanged by the matrix transpose • The transpose of an m X n matrix A is the 
nX m matrix A 1 obtained by reflecting about the diagonal: A t = (bij )， where 

bij = aji. 

For instance, 



The rules for computing with the transpose are given in (2.24): 


(2.24) 


⑻ 

(A + B) 1 = + 

(b) 

(cA) x = cA l . 

(c) 

(ABf = bW ! 

(d) 

(A7 = A, 


Using formulas (2,24c and d)，we can deduce facts about right multiplication ， 
XP ， from the corresponding facts about left multiplication. 

The elementary matrices (2.6) act by right multiplication as the following ele¬ 
mentary column operations: 

(2.25) 

(a) Add a • (column /) to (column j). 

(b) Interchange (column /) and (column y). 

(c) Multiply (column /) by c ^ 0. 


3. DETERMINANTS 

Every square matrix A has a number associated to it called its determinant. In this 
section we will define the determinant and derive some of its properties. The deter¬ 
minant of a matrix A will be denoted by det A. 
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The determinant of a 1 x 1 matrix is just its unique entry 
(3.1) det [a] = a, 

and the determinant of a 2 X 2 matrix is given by the formula 


(3.2) 


det 


a 

c 


b 

d 





If we think of a 2 x 2 matrix A as an operator on the space U 2 of real two- 
dimensional vectors, as in Section 2, then det A can be interpreted geometrically. Its 
absolute value is the area of the parallelogram which forms the image of a unit 
square under the operation- For example, the area of the shaded region of Figure 
(2.3) is 10. The determinant is positive or negative according to whether the orienta¬ 
tion of the square is preserved or reversed by the operation. Moreover, det A = 0 if 
and only if the parallelogram degenerates to a line segment，and this occurs if and 
only if the two columns of A are proportional. 

The set of all n x n matrices forms a space of dimension n 2 , which we denote 
by U nXn . We will regard the determinant of n x n matrices as a function from this 
space to the real numbers: 

det: U nXn — >U. 


This just means that det is a function of the n 2 matrix entries. There is one such 
function for each positive integer «• Unfortunately there are many formulas for the 
determinant, and all of them are complicated when n is large- The determinant is 
important because it has very nice properties, though there is no simple formula for 
it. Not only are the formulas complicated, but it may not be easy to show directly 
that two of them define the same function. So we will use the following strategy: We 
choose one formula essentially at random and take it as the definition of the determi¬ 
nant. In that way we are talking about a particular function. We show that the func¬ 
tion we have chosen has certain very special properties* We also show that our cho¬ 
sen function is the only one having these properties. Then，to check that some other 
formula defines the same determinant, we have to check only that the function which 
it defines has these same properties. It turns out that this is usually relatively easy. 

The determinant of an n x n matrix can be computed in terms of certain 
in — 1) x (n - 1) determinants by a process called expansion by minors. This ex¬ 
pansion allows us to give a recursive definition of the determinant function. Let A be 
an « x n matrix and let Aij denote the (n — 1) x (n — 1) matrix obtained by crossing 
out the ith row and the yth column of A : 
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For example, if 




0 

3 ' 

， then A 2 i = 

— 

0 3 

< i 

A = 

2 

1 

2 


-0 

5 

1^ 


D i 


Expansion by minors on the first column is the formula 

(3*4) det A = a n det A u - a 2 i det A 2 \ +, - ± a n \ det Am . 

The signs alternate. We take this formula, together with (3,1)，as a recursive 
definition of the determinant. Notice that the formula agrees with (3.2) for 2 X 2 
matrices. 

The determinant of the matrix A shown above is 
det A = 1 • det 

The three 2x2 determinants which appear here can be computed by expanding by 
minors again and using (3.1), or by using (3,2), to get 

det A — 1.(- 9) — 2. (- 15) + 0.(-3) = 21. 

There are other formulas for the determinant，including expansions by minors on 
other columns and on rows，which we will derive presently [see (4.11 ， 5,1 ， 5.2)]. 

It is important, both for computation of determinants and for theoretical con¬ 
siderations, to know some of the many special properties satisfied by determinants. 
Most of them can be verified by direct computation and induction on n, using expan¬ 
sion by minors (3.4). We will list some without giving formal proofs. In order to be 
able to interpret these properties for functions other than the determinant, we will 
denote the determinant by the symbol d for the time being. 

(3.5) d(/) = 1. 

(3.6) The function d(A) is linear in the rows of the matrix. 



By this we mean the following: Let Ri denote the row vector which is the ith row of 

the matrix，so that A can be written symbolically as 

— —» 

—— Ri —— 

A = _ • 

* 

Rn 

By definition ， linearity in the ith row means that whenever R and S are row vectors 
then 
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and 

r * n r* • 


d - cR - = cd - R 


where the other rows of the matrices appearing in these relations are the same 
throughout, For example ， 



1 

2 

4 


一 1 2 

4" 


"1 

2 

4— 

det 

3+5 

4+6 

2 + 3 

=det 

3 4 

2 

+ det 

5 

6 

3 


2 

一 1 o 


-2-1 

0- 


_2 

-1 

0— 


and 



— 1 

2 

4 — 


^1 

2 

4 — 

det 

2.5 

2.6 

2-3 

= 2 • det 

5 

6 

3 


一 2 

-1 

0 一 


2 

一 1 

0_ 


Linearity allows us to operate on one row at a time ，with the other rows left fixed 
Another property: 


(3.7) If two adjacent rows of a matrix A are equal，then d(A) = 0_ 


Let us prove this feet by induction on n. Suppose that rows j and j + l are equal. 
Then the matrices An defined by (3.3) also have two rows equal，except when i = j 
or i = )+1. When An has two equal rows, its determinant is zero by induction. 
Thus only two terms of (3.4) are different from zero, and 

= ±aj { d(Aj\) T a ； +nd(Ay+i i). 

Moreover, since the rows Rj and Rj+i are equal, it follows that Aji = i and that 
aji = i. Since the signs alternate, the two terms on the right side cancel, and 
the determinant is zero. 

Properties (3.5-3.7) characterize determinants uniquely [see (3.14)], and we 
will derive further relations from them without going back to definition (3.4). 


(3.8) If a multiple of one row is added to an adjacent row, 

the determinant is unchanged. 


For，by (3.6) and (3,7 )， 

t- * n r* • t r * 


d 


* R — 
S-^CR 



R 

S 


― cd 


R 

R 



R 

S 
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The same reasoning works if s is above R. 

(3.9) If two adjacent rows are interchanged, 

the determinant is multiplied by ~l. 


We apply (3.8) repeatedly: 


d 


R 

S 



(s-R) 



R -h (S- R ) — 

— (s-R ) —— 



=d 


(5 — r) 



(~R) 



-1 



R 


(3.7') If two rows of a matrix A are equal，then d(A) = 0. 

For，interchanging adjacent rows a few times results in a matrix A f with two adjacent 
rows equal. By (3.7) d(A f ) = 0, and by (3.9) d(A) = ±det(A')- 

Using (3.7’)，the proofs of (3.8) and (3.9) show the following: 


(3.8') If a multiple of one row is added to another row, 

the determinant is not changed. 

(3-9 ’） If two rows are interchanged ， 

the determinant is multiplied by 


Also, （ 3.6) implies the following: 

(3.10) If a row of A is zero, then d(A) = 0* 


If a row is zero, then A doesn’t change when we multiply that row by 0. But accord¬ 
ing to (3,6), d(A) gets multiplied by 0. Thus d ⑷ = 0d(A) = 0. 

Rules (3.8') ， (3.9 ; ) ? and (3.6) describe the effect of an elementary row opera¬ 
tion (2.7) on the determinant，so they can be rewritten in terms of the elementary 
matrices. They tell us that d(EA) = d(A) if E is an elementary matrix of the first 
kind, that d(EA) = —d(A) if E is of the second kind，and (3.6) that d(EA) = cd(A) if 
E is of the third kind. Let us apply these rules to compute d(£) when £ is an ele¬ 
mentary matrix. We substitute A = /, Then, since d(/) = 1， the rules determine 
d(El) = d ⑻： 
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(3.11) The determinant of an elementary matrix is: 

(i) First kind (add a multiple of one row to another): d(E) = 1， by (3.8 , ). 

(ii) Second kind (row interchange): 水五） =-1, by (3.9'). 

(iii) Third kind (multiply a row by a nonzero constant): d(E) = c, by (3,6), 

Moreover, if we use rules (3*8') ，（ 3*9')，and (3.6) again，applying them this time to 
an arbitrary matrix A and using the values for d(£) which have just been determined, 
we obtain the following: 

(3_12) Let E be an elementary matrix and let A be arbitrary • Then 

d(EA) = d ⑻ d ⑷. 

Recall from (2.19) that every square matrix A can be reduced by elementary 
row operations to a matrix A 7 which is either the identity I or else has its bottom row 
zero: 

A f = Ek* m9 E\A, 

We know by (3.5) and (3.10) that d(A) f = 1 or d(A f ) = 0 according to the case. By 

(3.12) and induction, 

(3.13) d(A') = dfe)"* d(£i)d(A). 

We also know d(£/)，by (3.11), and hence we can use this formula to compute d(A). 

(3.14) Theorem. Axiomatic Characterization of the Determinant: The determinant 
function (3.4) is the only one satisfying rules (3.5-3.7). 

Proof• We used only these rules to arrive at equations (3.11) and (3.13), and 
they determine d(A). Since the expansion by minors (3.4) satisfies (3.5 — 3.7)，it 
agrees with (3J3). □ 

We will now return to our usual notation det A for the determinant of a matrix. 

(3.15) Corollary, A square matrix A is invertible if and only if det A 0, 

This follows from formulas (3.11), (3.13), and (2.18). By (3.11), det Ei # 0 for all 
i. Thus if A' is as in (3.13)，then det A ^ 0 if and only if det A f 0, which is the 
case if and only if A r = L By (2.18), A r — I if and only if A is invertible, □ 

We can now prove one of the most important properties of the determinant 
function: its compatibility with matrix multiplications. 

(3.16) Theorem. Let A，B be any two nXn matrices. Then 

det(AB) = (det A) (det s). 
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Proof • We note that this is (3.12) if A is an elementary matrix. 

Case 1: A is invertible* By (2.18b), A is a product of elementary matrices: 
A = E\ Ek^ By (3,12) and induction，det A = {det £i) … （det £*)， and det AB = 
dct(E\ --- EkB) = (det £i) ••• (det £jt)(det fi) = (det A) (det B). 

Case 2: A is not invertible. Then det A = 0 by (3.15)，and so the theorem will fol¬ 
low in this case if we show that det(A5) = 0 too. By (2.18)，A can be reduced to a 
matrix A f = Ek 9 ^E\A having bottom row zero. Then the bottom row of A f B is also 
zero; hence 

0 = dct(A r B) = det(E k ^-E { AB) = (det f^)---(det £i)(det AB). 

Since det Ei ^ 0, it follows that det AB = 0. n 

(3.17) Corollary* If A is invertible ? det(A _1 ) 二 — ^ — • 

det A 

Proof, (det A) (det A]) = det / = 1. □ 

Note. It is a natural idea to try to define determinants using rules (3-11) and 
(3.16). These rules certainly determine det A for every invertible matrix A, since we 
can write such a matrix as a product of elementary matrices. But there is a problem. 
Namely, there are many ways to write a given matrix as a product of elementary 
matrices. Without going through some steps as we have, it is not clear that two such 
products would give the same answer for the determinant. It is actually not particu¬ 
larly easy to make this idea work. 

The proof of the following proposition is a good exercise• 

(3.18) Proposition* Let A 1 denote the transpose of A. Then 

det A = det A 1 . □ 

(3.19) Corollary. Properties (3.6-3.10) continue to hold if the word row is re¬ 
placed by column throughout. □ 

4. PERMUTA TION MA TRICES 

A bijective map p from a set S to itself is called a permutation of the set: 

(4.1) p: S - >S. 

For example, 

1 3 


( 4 . 2 ) 


3 AAAAA>. 2 
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is a permutation of the set {1,2, 3}. It is called a cyclic permutation because it oper¬ 
ates as 



There are several notations for permutations. We will use function notation in 
this section，so that p(\) denotes the value of the permutation p on the element x. 
Thus if p is the permutation given in (4.2), then 

/?(1) 二 3,^(2) = l,p(3) = 2. 

A permutation matrix P is a matrix with the following property: The operation 
of left multiplication by 尸 is a permutation of the rows of a matrix. The elementary 
matrices of the second type (2.6ii) are the simplest examples. They correspond to 
the permutations called transpositions ， which interchange two rows of a matrix, 
leaving the others alone. Also, 

—0 10 

0 0 1 

1 0 0 
■ ■ 

is a permutation matrix. It acts on a column vector X = (x, y, zf as 


(4.3) 


P 


PX 


The entry in the first position is sent to the third position, and so on, so P has per¬ 
muted rows according to the cyclic permutation p given in (4.2). 

There is one point which can cause confusion and which makes it important 
for us to establish our notation carefully* When we permute the entries of a vector 

according to a permutation p, the indices are permuted in the opposite 
way. For instance，multiplying the column vector X = (x\,X 2 ,x 3 y by the matrix in 

(4.3) gives 

[0 10 

(4.4) PX = 0 0 1 

_1 0 0 

The indices in (4.4) are permuted by which is the inverse of 

the permutation p. Thus there are two ways to associate a permutation to a permuta¬ 
tion matrix P: the permutation p which describes how P permutes the entries，and the 
inverse operation which describes the effect on indices. We must make a decision，so 
we will say that the permutation associated to 尸 is the one which describes its action 
on the entries of a column vector. Then the indices are permuted in the opposite 
way，so 

r^pki)] 




mmm — 1 


— 

^3 

^3 


^1 


V- z X 


~ V- z 



loo 
o o 1 


(4.5) 


PX = 


x P^Hn) 
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Multiplication by P has the corresponding effect on the rows of an n x r matrix A. 

The permutation matrix P can be written conveniently in terms of the matrix 
units (2,5) or in terms of certain column vectors called the standard basis and de¬ 
noted by d . The vector a has a 1 in the ith position as its single nonzero entry, so 
these vectors are the matrix units for an n x 1 matrix. 

(4.6) Proposition. Let P be the permutation matrix associated to a permutation p. 

(a) The jth column of P is the column vector e p (j) • 

(b) P is a sum of n matrix units; P = e#i)i + ••• + e p _ = ° 

* 

j 

A permutation matrix P always has a single 1 in each row and in each column ， 
the rest of its entries being 0. Conversely, any such matrix is a permutation matrix. 

(4.7) Proposition. 

(a) Let p, q be two permutations, with associated permutation matrices P,Q. Then 
the matrix associated to the permutation pq is the product PQ. 

(b) A permutation matrix P is invertible, and its inverse is the transpose matrix: 

尸 _i = 尸 t 

Proof. By pq we mean the composition of the two permutations 

( 4 . 8 ) pq(i) = p(q(i)). 

Since P operates by permuting rows according to p and Q operates by permuting ac¬ 
cording to q, the associative law for matrix multiplication tells us that PQ permutes 
according to pq: 

(PQ)X = P{QX ). 

Thus PQ is the permutation matrix associated to pq. This proves (a)- We leave the 
proof of (b) as an exercise. □ 

The determinant of a permutation matrix is easily seen to be 士 1 ， using rule 

(3.9) . This determinant is called the sign of a permutation: 

(4.9) sign p — dct P = ±1. 

The permutation (4.2) has sign +1， while any transposition has sign -1 [see 
(3.11ii)]. A permutation p is called odd or even according to whether its sign is — 1 
or +1_ 

Let us now go back to an arbitrary nx n matrix A and use linearity of the de¬ 
terminant (3.6) to expand det A. We begin by working on the first row. Applying 
(3.6), we find that 
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det A = det 


anO 


0 


r 2 


* 

* 


Rn 


det 




- r 2 - 


« 


Rn 


+ … + det 


0 


Oa h 




r 2 


Rn 


We continue expanding each of these determinants on the second row, and so on. 
When we are finished, det A is expressed as a sum of many terms, each of which is 
the determinant of a matrix M having only one entry left in each row: 


M 


fli? 


a 2 ? 


dn? 


Many of these determinants will be zero because a whole column vanishes • Thus the 
determinant of a 2 x 2 matrix is the sum of four terms: 


det 


a b 
c d 


det 


det 


a 

o' 

+ det 

"0 

b 

c 

d_ 

c 

d 

a 

o' 

+ det 

a 

0 

_c 

0_ 

-0 

d 


det 


0 b 

c 0 


det 


■ 

0 b 
0 d 


But the first and fourth terms are zero; therefore 


det 


a b 
c d 


det 


a 0 
0 d 


det 


0 


c 


0 


In fact, the matrices M having no column zero must have one entry atj left in each 
row and each column. They are like permutation matrices P, except that the Vs 
in P are replaced by the entries of A: 


(4.10) ^ = E e P u) jy M 2 cip(j)je p (j)j. 

j J 

By linearity of the determinant (3.6), 

det M = (ap(i)i … 吟 (咖 ）(det 尸） 

=(sign p)(a P d)i …( 咖 ) • 

There is one such term for each permutation p. This leads to the formula 

(4.11) det A = 2 (signp)a p (i)i … a p ( n ) u 

perm/? 
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where the sum is over all permutations of the set {1 ， •“ ， n}. It seems slightly nicer to 
write this formula in its transposed form: 


(4.12) 


d&tA ~ 2 (sign p)ch p ⑴ … a„ p („). 

perm/? 


This is called the complete expansion of the determinant. 

For example, the complete expansion of the determinant of a 3 x 3 matrix has 
six terms: 


(4.13) 


Clu <3l2 fll3 


det fl2i CL 22 CI 23 


^31 ^32 fl33j 

= d\\d22Clyh + < 3 ] 2 < 323<331 + d^CL2\CLyi ~ dndl^dyi ~ d\lCl2\Cl33 ~ 3 ^ 22 ^ 3 ] * 


The complete expansion is more of theoretical than of practical importance ， 
because it has too many terms to be useful for computation unless n is small. Its the¬ 
oretical importance comes from the fact that determinants are exhibited as polyno¬ 
mials in the n 2 variable matrix entries aij ，with coefficients ±1. This has important 
consequences. Suppose, for example, that each matrix entry atj is a differentiable 
function of a single variable: aij = aij{t). Then det A is also a differentiable function 
of tj because sums and products of differentiable functions are differentiable. 


5. CRAMER’S RULE 

The name Cramer's Rule is applied to a group of formulas giving solutions of sys¬ 
tems of linear equations in terms of determinants. To derive these formulas we need 
to use expansion by minors on columns other than the first one，as well as on rows. 

(5.1) Expansion by minors on the jth column: 

det A = (― 1 )) +1 叫 det Aiy + (—l)^ +2 fl 2 /det A 2 y + + {—\y^ n a n j det A n j. 

(5.2) Expansion by minors on the ith row: 

det A = (- l) i+1 fl/i det An + (- l) i+2 a /2 det A 。 + … + (- \) l+n ai n det Am. 

In these formulas Aij is the matrix (3.3). The terms (-1) /+ ^ provide alternating signs 
depending on the position (/ ? j) in the matrix. (I doubt that such tricky notation is re¬ 
ally helpful，but it has become customary.) The signs can be read off of the follow¬ 
ing figure: 
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_ ■ 

™ — • • • 

■■■ 

(5.3) + - • 

* 

4 

* 

* 

To prove (5,1)，one can proceed in either of two ways: 

(a) Verify properties (3,5-3.7) for (5*1) directly and apply Theorem (3.14)，or 

(b) Interchange (column 7 ) with (column 1) and apply (3.9 ; ) and (3.19). 

We omit these verifications. Once (5.1) is proved, (5.2) can be derived from it by 
transposing the matrix and applying (3.18). 

(5.4) Definition. Let A be an « x « matrix. The adjoint of A is the w x « matrix 
whose (/, j) entry (adj)" is (-l ) r+7 det Aji = aji ， where Aij is the matrix obtained by 
crossing out the iih row and the jth column, as in (3.3): 

(adj A) = 


where aij = (一 iy 七 det 如 . Thus 


(5.5) 
and 

(5.6) 



We can now proceed to derive the formula called Cramer’s Rule. 


(5.7) Theorem. Let 8 — det A. Then 


(adj A) * A = 81 ， and A* (adj A) = 8 L 


Note that in these equations 


SI ~ 




* 


_ 


8 
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(5.8) Corollary• Suppose that the determinant 5 of A is not zero. Then 

A- 1 = ^ (adj A). 

For example, the inverse of the 2x2 matrix 

」_「 d -b 

ad — be • 

~c a 

The determinant of the 3x3 matrix whose adjoint is computed in (5.6) happens to 
be 1; therefore for that matrix, = adj A. 

The proof of Theorem (5.7) is easy. The (/ ? j) entry of (adj A) * A is 

(5.9) (adj)aai ； + _•• + (adj) ， „ 〜 = aua X j + … + a n ia n j^ 

If i = j\ this is formula (5.1) for S, which is the required answer. Suppose i ^ j\ 
Consider the matrix B obtained by replacing (column i) by (column j) in the matrix 
A, So (column j) appears twice in the matrix B. Then (5.9) is expansion by minors 
for B on its ith column. But det B = 0 by (3,7') and (3.19). So (5.9) is zero, as re¬ 
quired. The second equation of Theorem (5.7) is proved similarly. □ 



Formula (5.8) can be used to write the solution of a system of linear equations 
AX ~ B, where ^ is an « x « matrix in a compact form, provided that det A ^ 0. 
Multiplying both sides by A' 1 , we obtain 

(5,10) X = A~ l B ^ — (adj A)fi, 

o 

where 8 = det A. The product on the right can be expanded out to obtain the for¬ 
mula 


(5_U) Xj = 石 io ； i) + … b n a n /)， 

where atj = ±det as above. 

Notice that the main term (b 曲 j + ••♦ + b n ot n j) on the right side of (5.11) 
looks like the expansion of the determinant by minors on the 7 th column，except that 
bi has replaced aij. We can incorporate this observation to get another expression for 
the solution of the system of equations. Let us form a new matrix replacing the 
jth column of A by the column vector B. Expansion by minors on the jth column 
shows that 


det Mj — {b\OL\j + … + b n O^nj) - 
This gives us the tricky formula 


(5.12) 


det Mj 
det A 
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For some reason it is popular to write the solution of the system of equations AX = B 
in this form，and it is often this form that is called Cramer，s Rule. However，this ex¬ 
pression does not simplify computation. The main thing to remember is expression 
(5.8) for the inverse of a matrix in terms of its adjoint; the other formulas follow 
from this expression. 

As with the complete expansion of the determinant (4.10), formulas 
(5.8—5.11) have theoretical as well as practical significance, because the answers 
and X are exhibited explicitly as quotients of polynomials in the variables {aij 9 bi}, 
with integer coefficients. If，for instance, atj and bj are all continuous functions of t, 
so are the solutions Xi . 


A general algebraical determinant in its developed form 
may be likened to a mixture of liquids seemingly homogeneous ， 
but which, being of differing boiling points，admit of being separated 

by the process of fractional distillation. 

James Joseph Sylvester 


EXERCISES 


h The Basic Operations 


1* What are the entries a 2 i and a 2 3 of the matrix 


1 2 5 

2 7 8 I? 

0 9 4 

■ 4 

2. Compute the products AB and BA for the following values of A and B. 



"l 2 3' 

3 3 1 


—8 _4 

(a) A - 

, B = 

9 5 


一 一 


-3-2 


(b)A 


4 




6-4 
—3 2 


(C) A 


—1 
0 


,B = [l 2 1 ] 


3. Let A = be a row vector，and let B = 

the products AB and BA. 

4. Verify the associative law for the matrix product 




bn 


be a column vector ‘ Compute 
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Notice that this is a self-checking problem. You have to multiply correctly, or it won’t 
come out. If you need more practice in matrix multiplication, use this problem as a 
model. 

i- —I r* 

1 b 


5, Compute the product 


a 


6. Compute 


7* Find a formula for 


1 1 1 


n 


， and prove it by induction 


8. Compute the following matrix products by block multiplication: 


1 

1 

1 

1 

5 

0 

1 

0 

1 

1 

0 

0 

1 

0 

1 

1 

0 


1 

2 

1 

0 

0 

1 

0 

1 

1 

0 

0 

1 

0 

1 

1 

3 


9. Prove rule (1.20) for block multiplication. 

10, Let A,B be square matrices. 

(a) When is (A + b)(a — b) — A 2 — B 2 ? 

(b) Expand (A + s) 3 . 

11, Let D be the diagonal matrix 


0 

1 2 

0 

1 0 

3 

0 1 


1 

蜱1， 

2 3 

4 

2 3 

5 

0 4 

一 


i * 


r 




1 


d 2 


dn 


and let A = (aij) be any nx n matrix. 

(a) Compute the products DA and AD. 

(b) Compute the product of two diagonal matrices. 

(c) When is a diagonal matrix invertible? 

12. An n x n matrix is called upper triangular if atj — 0 whenever / > / Prove that the 
product of two upper triangular matrices is upper triangular, 

13. In each case, find all real 2x2 matrices which commute with the given matrix. 


(a) 


0 


0 

0 


(b) 


0 

0 


0 


(c) 


2 

0 


0 

6 


(d) 


0 


(e) 


2 

0 


6 


14. Prove the properties 0 + A= A ， 0A = 0, and AO = 0 of zero matrices. 

15. Prove that a matrix which has a row of zeros is not invertible. 

■r 

16. A square matrix A is called nilpotent if A k — 0 for some k > 0. Prove that if A is nilpo 
tent, then / + A is invertible. 

17* (a) Find infinitely many matrices B such that BA = I 2 when 


A 


2 


(b) Prove that there is no matrix C such that AC = / 3 . 



Chapter 1 


Exercises 


33 


18. Write out the proof of Proposition (L18) carefully, using the associative law to expand 
the product (A5)(B 一 1 A 一 ”■ 

19. The trace of a square matrix is the sum of its diagonal entries: 

trA = an + a 2 2 + *-* + a nn ^ 

(a) Show that tr (A + B) = tr A + tr B，and that tr AB = tr BA. 

(b) Show that if B is invertible，then tr A = tr BAB'\ 

20. Show that the equation AB — BA = I has no solutions in n x n matrices with real entries. 

2. Row Reduction 


1, (a) For the reduction of the matrix M (2.10) given in the text，determine the elementary 

matrices corresponding to each operation. 

(b) Compute the product P of these elementary matrices and verify that PM is indeed the 
end result ， 

2. Find all solutions of the system of equations AX — B when 




— 1 

2 

1 r 



A = 

3 ' 

0 0 4 




_1 - 

4-2 - 2 一 


and B has the following value: 

1— 1 

1 





0 


1 


0 

(a) 

0 

(b) 

1 

(C) 

2 


0 


0 

一 _ 


2 

3. Find all solutions of the equation + 

X2 + 

2x 3 

—X4 

- 3. 


4. Determine the elementary matrices which are used in the row reduction in Example 
(2.22) and verify that their product is A -1 . 

5, Find inverses of the following matrices: 



6 . Make a sketch showing the effect of multiplication by the matrix A = ^ 3 on the 
plane R 2 . 

7. How much can a matrix be simplified if both row and column operations are allowed? 

8 . (a) Compute the matrix product * 

(b) Write the identity matrix as a sum of matrix units. 

(c) Let A be any nx n matrix. Compute euAejj. 

(d) Compute eijA and A 

9. Prove rules (2.7) for the operations of elementary matrices. 

10, Let ^ be a square matrix. Prove that there is a set of elementary matrices Ei， …， Ek 
such that Ek … E X A either is the identity or has its bottom row zero. 

1 L Prove that every invertible 2 x 2 matrix is a product of at most four elementary matrices. 

12, Prove that if a product AB of n x n matrices is invertible then so are the factors A ， 5. 

13. A matrix A is called symmetric if A = A 1 . Prove that for any matrix A , the matrix AA l is 
symmetric and that if A is a square matrix then A 十 A 1 is symmetric. 
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14. (a) Prove that (ab ) 1 = bW and that A tt — A. 

(b) Prove that if A is invertible then = (A 1 ) 1 . 

15. Prove that the inverse of an invertible symmetric matrix is also symmetric. 

16. Let A and B be symmetric nx n matrices. Prove that the product AB is symmetric if and 
only if AB — BA. 

17. Let A be an n x Ai matrix. Prove that the operator “left multiplication by determines A 
in the following sense; If AX = BX for veiy column vector x, then A = B. 

18. Consider an arbitrary system of linear equations AX = B where A and B have real entries. 

(a) Prove that if the system of equations AX = B has more than one solution then it has 
infinitely many. 

(b) Prove that if there is a solution in the complex numbers then there is also a real solu¬ 
tion. 

*19. Prove that the reduced row echelon form obtained by row reduction of a matrix A is 
uniquely determined by A. 


3, Deteminants 

1, Evaluate the following determinants: 



1 * 


1 1 


2 0 1 

(a) 

1 1 

2 - i 3 
一 •— 

(b) 

1 丄 

1—1 

(c) 

0 1 0 

1 0 2 



0 0 
0 0 
3 0 
7 4 


(e) 


1 4 

2 3 
4 1 
2 0 


1 3 

5 0 

0 0 

0 0 
■ 


2. Prove that det 


1 

2 

5 

■ 

6 


2 

1 

5 

■ 

1 

3 

1 

7 

7 

二 —det 

1 

3 

7 

0 

0 

0 

2 

3 

0 

0 

2 

1 

4 

2 

1 

5 


2 

•m 

4 

1 

4 


3. Verify the rule det AB = (det A)(det B) for the matrices A = 




Note 


― ■ ■ — —* 

that this is a self-checking problem. It can be used as a model for practice in computing 
determinants. 


4* Compute the determinant of the following nx n matrices by induction on n. 


(a) 


mwm ， 

1 


— ■ 

2-1 

1 


- 1 2-1 

_ 

* 

• 

(b) 

-1 2-1 

* 

• 


-1 • 

1 


*2-1 

1 


-1 2 


5. Evaluate det 


1 

2 

3 


2 3 

2 3 

3 3 


n 


n 


* _ _ 


n 
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*6* Compute det 


2 


2 

1 


2 1 
1 2 
1 


2 1 
1 2 
1 

1 


2 


2 




7. Prove that the determinant is linear in the rows of a matrix，as asserted in (3.6). 

8 . Let A be an n x n matrix. What is det (-A)? 

9. Prove that det A x — det A. 


10• Derive the formula det 


a b 
c d 


ad — be from the properties (3.5, 3.6, 3.7, 3.9). 


11. Let A and B be square matrices. Prove that det(Aff) 

A B 


det(BA) 


12. Prove that det 


0 D 


(det A)(det D)，if A and D are square blocks. 


*13. Let a2nx2n matrix be given in the form M 


A B 
C D 


， where each block is an n x n 


matrix. Suppose that A is invertible and that AC = CA. Prove that det M = det(A/) _ CB) 
Give an example to show that this formula need not hold when AC + CA. 


4. Permutation Matrices 

1, Consider the permutation p defined by I>vww3, 3^^4, 4 〜 ^2. 

(a) Find the associated permutation matrix P. 

(b) Write p as a product of transpositions and evaluate the corresponding matrix product. 

(c) Compute the sign of p . 

2* Prove that every permutation matrix is a product of transpositions. 

3* Prove that every matrix with a single 1 in each row and a single 1 in each column, the 
other entries being zero, is a permutation matrix. 

4. Let /? be a permutation. Prove that sign/? = sign/ 7 ' 1 . 

5* Prove that the transpose of a permutation matrix P is its inverse. 

6 . What is the permutation matrix associated to the permutation i^wwn-i? 

7. (a) The complete expansion for the determinant of a 3 x 3 matrix consists of six triple 

products of matrix entries, with sign. Learn which they are. 

(b) Compute the determinant of the following matrices using the complete expansion ， 
and check your work by another method: 


1 1 2 


™4 -1 r 


a b c 

2 4 2 

> 

1 1-2 


1 0 1 

0 2 1 — 


—1 - 1 l 一 


1 1 1- 


8 . Prove mat the complete expansion (4.12) defines the determinant by verifying rules 
(3.5-3,7). 

9. Prove that formulas (4.11) and (4.12) define the same number. 
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5 . Cramer’s Rule 


1. Let 


a b 
c d 


be a matrix with determinant 1. What is 4— 1 ? 


V. 

2, (self-checking) Compute the adjoints of the matrices 


and 


a b c 
1 0 1 
1 1 1 


'l 2 
3 4 

k* — 


1 1 2 


4-1 1 


2 4 2 

0 2 1 


1 1-2 
1-1 1 


,and verify Theorem (5.7) for them 


3. Let A be an n x n matrix with integer entries a". Prove that A -1 has integer entries if and 
only if det A — ± 1. 

4. Prove that expansion by minors on a row of a matrix defines the determinant function. 


Miscellaneous Problems 


1. Write the matrix 



2 

4 


as a product of elementary matrices，using as few as you can. 


Prove that your expression is as short as possible. 

2. Find a representation of the complex numbers by real 2x2 matrices which is compatible 
with addition and multiplication. Begin by finding a nice solution to the matrix equation 
A 2 = -/. 


3* {Vandermonde determinant) (a) Prove that det 


1 

a 

a 


2 


1 

b 

b 2 


1 

c = {b — a)(c — a)(c — b). 


*(b) Prove an analogous formula for n x n matrices by using row operations to clear out 
the first column cleverly. 

*4. Consider a general system AX — B of m linear equations in n unknowns. If the coefficient 
matrix A has a left inverse a matrix such that A f A — l ny then we may try to solve the 
system as follows: 

AX ^ B 


A f AX = A f B 
X = A f B. 


But when we try to check our work by running the solution backward，we get into 
trouble: 

X = A r B 
AX = AA f B 
AX Z B. 

We seem to want A f to be a right inverse: AA f = / m ，which isn’t what was given. Explain. 
(Hint: Work out some examples.) 
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5. (a) Let Abe a real 2x2 matrix, and let Ai,A 2 bc the rows of A. Let P be the parallelo¬ 
gram whose vertices are + a 2 . Prove that the area of P is the absolute 

value of the determinant det A by comparing the effect of an elementary row opera¬ 
tion on the area and on det A. 

♦(b) Prove an analogous result for n x n matrices. 

*6. Most invertible matrices can be written as a product A = Li/ of a lower triangular matrix 

L and an upper triangular matrix U, where in addition all diagonal entries of C/ are 1. 

(a) Prove uniqueness, that is，prove that there is at most one way to write A as a product* 

(b) Explain how to compute L and U when the matrix A is given. 

(c) Show that every invertible matrix can be written as a product LPU ，where L ， f/ are as 
above and 尸 is a permutation matrix. 

7. Consider a system of n linear equations in n unknowns: AX = B ，where A and B 

have integer entries. Prove or disprove the following* 

(a) The system has a rational solution if det A ^ 0. 

(b) If the system has a rational solution, then it also has an integer solution, 

*8, Let A y B be mx n and nXm matrices. Prove that l m — AB is invertible if and only if 

I n — BA is invertible. 
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Groups 

II est peu de notions en mathematiques qui soient plus primitives 

que celle de loi de composition • 

Nicolas Bourbaki 


L THE DEFINITION OF A GROUP 

In this chapter we study one of the most important algebraic concepts, that of a 
group. A group is a set on which a law of composition is defined，such that all ele¬ 
ments have inverses. The precise definition is given below in (1,10). For example, 
the set of nonzero real numbers forms a group 1R X under multiplication, and the set 
of all real numbers forms a group 1R+ under addition. The set of invertible n x n 
matrices, called the general linear group, is a very important example in which the 
law of composition is matrix multiplication. We will see many more examples as we 
go along. 

By a law of composition on a set S, we mean a rule for combining pairs a, b of 
elements S to get another element, say p, of S. The original models for this notion 
are addition and multiplication of real numbers. Formally ， a law of composition is a 
function of two variables on S, with values in 5, or it is a map 

SXS — >S 

a, 

Here, S x S denotes，as always，the product set of pairs (a, b) of elements of S. 

Functional notation p = f(a, b) isn’t very convenient for laws of composition* 
Instead, the element obtained by applying the law to a pair (a, b) is usually denoted 
using a notation resembling those used for multiplication or addition: 

p = ab，ax b，a 。 b，a + b ， and so on, 

a choice being made for the particular law in question. We call the element p the 
product or sum of a and b, depending on the notation chosen* 
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Our first example of a law of composition, and one of the two main examples ， 
is matrix multiplication on the set S of nx n matrices* 

We will use the product notation ab most frequently. Anything we prove with 
product notation can be rewritten using another notation，such as addition. It will 
continue to be valid, because the rewriting is just a change of notation. 

It is important to note that the symbol ab is a notation for a certain element of 
S. Namely，it is the element obtained by applying the given law of composition to 
the elements called a and b. Thus if the law is multiplication of matrices and if 



a = 


0 



2 


and b 


2 


0 



then ab denotes the matrix 



4 



2 


Once the 


product ab has been evaluated, the elements a and b can not be recovered from it. 

Let us consider a law of composition written multiplicatively as ab. It will be 
called associative if the rule 


(1.1) (ab)c = a (be) (associative law) 

holds for all a, b, c in S, and commutative if 

(L2) ab = ba (commutative law) 

holds for all a y b in S. Our example of matrix multiplication is associative but not 
commutative. 

When discussing groups in general，we will use multiplicative notation. It is 
customary to reserve additive notation a + 办 for commutative laws of composition ， 
that is, when a + b = b ^ a for all a, b. Multiplicative notation carries no implica¬ 
tion either way concerning commutativity. 

In additive notation the associative law is (a + 办 ） + c = a + (办 + c)，and in 
functional notation it is 

f(f(a,b),c) = f(a,f(b,c)). 

This ugly formula illustrates the fact that functional notation isn’t convenient for al* 
gebraic manipulation. 

The associative law is more fundamental than the commutative law; one reason 
for this is that composition of functions，our second example of a law of composi¬ 
tion, is associative. Let T be a set, and let 发 ， /be functions (or maps) from T to T. 
Let 尽 。/ denote the composed map f /vvw (/W). The rule 

g ， f^^gof 

is a law of composition on the set S = Maps(r, T) of all maps T — ^T. 

As is true for matrix multiplication, composition of functions is an associative 
law. For if f ， g，h are three maps from T to itself, then (h 。 g) ° f = h 。 （g 。 f) •• 





* 



40 


Groups Chapter 2 


This is clear, since both of the composed maps send t^^h(g(f(t))). 

The simplest example is that 7 is a set of two elements {a, b}. Then there are 
four maps T — >T: 

i : the identity map, defined by i(a) = a, i(b) = b\ 
r: the transposition ， defined by t(o) = b, r{b) = a; 
a : the constant function a(a) = a (b) = a; 
p : the constant function P(a) = /3 (b) — b. 

The law of composition on S can be exhibited in a multiplication table as follows: 


(1.3) 



/3 

a 

a 

/3 


which is to be read in this way; 



• 4 • 


V 


u^v 


Thus t ° a = (5, while a : 。 t = a:. Composition of functions is not commutative. 

Going back to a general law of composition, suppose we want to define the 
product of a string of n elements of a set: 

(hW ci n = ? 

There are various ways to do this using the given law, which tells us how to multiply 
two elements. For instance, we could first use the law to find the product aiat, then 
multiply this element by , and so on: 

When n = 4 f there are four other ways to combine the same elements; (aia 2 )(a 3 a 4 ) 
is one of them. It can be proved by induction that if the law is associative, then all 
such products are equal. This allows us to speak of the product of an arbitrary string 
of elements, 

(1,4) Proposition* Suppose an associative law of composition is given on a set S. 
There is a unique way to define, for every integer n, a product of n elements 
a\,...,a n of S (we denote it temporarily by [ai - - * a n ]) with the following properties: 
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(i) the product [a\] of one element is the element itself; 

(ii) the product [a\a 2 ] of two elements is given by the law of composition; 

(iii) for any integer i between 1 and n, [a Y … = [fli “• 免 ] [ 叫 +i 

The right side of equation (iii) means that the two products [a\ •.•«/] and [ 印 +i a n ] 
are formed first and the results are then multiplied using the given law of composi¬ 
tion. 


Proof • We use induction on n. The product is defined by (i) and (ii) for n < 2, 
and it does satisfy (iii) when n — 2. Suppose that we know how to define the 
product of r elements when r < n — 1, and that this product is the unique product 
satisfying (iii). We then define the product of n elements by the rule 

[ai … = [fli …知 


where the terms on the right side are those already defined. If a product satisfying 
(iii) exists, then this formula gives the product because it is (iii) when i — n — 1. So 
if it exists, the product is unique. We must now check (iii) for i < n — 


[ai ••為 ]=[fli •••〜— 

= ([a i …印如出 • 
= [ai … • 


(our definition) 

a n ^i])[an] (induction hypothesis) 
a n -\][a n ]) (associative law) 


[ ai …卬][卬 


+i 


a n ] 


(induction hypothesis ). 


This completes the proof. We will drop the brackets from now on and denote the 

product by n 心口 


An identity for a law of composition is an element e of S having the property 

that 

(1.5) ea = a and ae = a, for all a E S. 

There can be at most one identity element. For if e,e f were two such elements, then 
since e is an identity ， ee f = e \ and since 〆 is an identity ， ee f = e. Thus e = e\ 
Both of our examples, matrix multiplication and composition of functions, 
have an identity. For nx n matrices it is the identity matrix /， and for Maps(r ? T) it 
is the identity map，which carries each element of T to itself. 

Often the identity is denoted by 1 if the law of composition is written multi- 
plicatively，or by 0 if it is written additively. These elements do not need to be re¬ 
lated to the numbers 1 and ◦，but they share the property of being identity elements 
for their laws of composition. 

Suppose that our law of composition has an identity, and let us use the symbol 
1 for it. An element a E S is called invertible if there is another element b such that 


ab = \ and ba = l. 
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As with matrix multiplication [Chapter 1 (1.17)], it follows from the associa¬ 
tive law that the inverse is unique if it exists• It is denoted by aT x : 

aa" 1 = a l a — 1. 


Inverses multiply in the opposite order: 

(1.6) (ab) 1 = b^a\ 

The proof is the same as for matrices [Chapter 1 (L18)]. 

Power notation may be used for an associative law of composition: 

(1.7) = a "• a (n > 1) 

n times 

a° = l provided the identity exists 

a n = a' 1 a" 1 provided a is invertible. 


The usual rules for manipulation of powers hold: 

(1.8) a r ^ s = a r a s and (a r ) s = a rs . 


It isn’t advisable to introduce fraction notation 


(1.9) 


b 

a 


unless the law of composition is commutative, for it is not clear from the notation 
whether the fraction stands for baT x or aT l b, and these two elements may be different. 

When additive notation is used for the law of composition, the inverse is 
denoted by — a，and the power notation a n is replaced by the notation na = 
a + *•* + a, as with addition of real numbers. 


(1.10) Definition. A group is a set G together with a law of composition which is 
associative and has an identity element，and such that every element of G has an 
inverse. 


It is customary to denote the group and the set of its elements by the same symbol ， 
An abelian group is a group whose law of composition is commutative* Addi¬ 
tive notation is often used for abelian groups. Here are some simple examples of 
abelian groups: 

(1.11) Z+: the integers，with addition; 

1R + : the real numbers, with addition; 

IR X : the nonzero real numbers, with multiplication; 

C+ ， C x : the analogous groups，where the set C of complex numbers 

replaces the real numbers U. 

Here is an important property of groups: 


(1.12) Proposition. Cancellation Law: Let a, b, c be elements of a group G. If 
ab = aCj then b = c.\i ba — ca, then b = c. 
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Proof • Multiply both sides of ab = ac by a~ l on the left: 

b = aT x ab = a x ac = c .口 

Multiplication by a~ l in this proof is not a trick; it is essential. If an element a is not 
invertible，the cancellation law need not hold. For instance, 0 • 1 = 0 • 2, or 



The two most basic examples of groups are obtained from the examples of laws 
of composition that we have considered — multiplication of matrices and composition 
of functions_by leaving out the elements which are not invertible. As we remarked 
in Chapter 1 ， the nX n general linear group is the group of all invertible n x n ma¬ 
trices. It is denoted by 

(1.13) GL n = {n x n matrices A with det A ^ 0}. 

If we want to indicate that we are working with real or complex matrices, we write 

GL„(R) or GLn(C), 

according to the case. 

In the set S = Maps(7" ? T) of functions，a map/: T — 夺 T has an inverse func¬ 
tion if and only if it is bijective. Such a map is also called a permutation of T. The 
set of permutations forms a group. In Example (1.3)，the invertible elements are i 
and r ? and they form a group with two elements. These two elements are the permu¬ 
tations of the set {a, b}. 

The group of permutations of the set of integers from 1 to n is 

called the symmetric group and is denoted by S n : 

(1.14) S n = group of permutations of 

Because there are n\ permutations of a set of n elements ? this group contains n\ ele¬ 
ments. (We say that the order of the group is «!,) The symmetric group S 2 consists of 
the two elements i and r ? where i denotes the identity permutation and r denotes the 
transposition which interchanges 1， 2 as in (1.3). The group law，composition of 
functions, is described by the fact that i is the identity element and by the relation 
tt — r 2 = i. 

The structure of S n becomes complicated very rapidly as n increases，but we 
can work out the case n = 3 fairly easily. The symmetric group S 3 contains six ele¬ 
ments. It will be an important example for us because it is the smallest group whose 
law of composition is not commutative. To describe this group, we pick two particu¬ 
lar permutations x,y in terms of which we can write all others. Let us take for x the 
cyclic permutation of the indices. It is represented by matrix (4.3) from Chapter 1: 


(1_15) 


x 


0 1 0 
0 0 1 
1 0 0 
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For y, we take the transposition which interchanges 1 ， 2, fixing 3: 


( 116 ) 



0 

0 

1 — 


The six permutations of {1,2,3} are 

(1.17) {l,x,x 2 ,y,xy,x 2 y} = W | 0 < / < 2, 0 < j < 1}, 

where 1 denotes the identity permutation. This can be verified by computing the 
products. 

The rules 


(1.18) x 3 = 1 ? y 2 = l, yx = x 2 y 

can also be verified directly. They suffice for computation in the group S 3 , Any 

product of the elements x,y and of their inverses, such as x l y 3 x 2 y for instance, can 

* * 

be brought into the form x l y J with 0 ^ 2 and 0 W s 1 by applying the above 

rules repeatedly. To do so, we move all occurrences of y to the right side using the 
last relation and bring the exponents into the indicated ranges using the first two 
relations: 


x~ l y 3 x 2 y = x 2 yx 2 y = x 2 (yx)xy = x 2 x 2 yxy = … = x 6 y 2 = 1, 

Therefore one can write out a complete multiplication table for & with the aid of 
these rules. Because of this，the rules are called defining relations for the group, a 
concept which we will study formally in Chapter 6. 

Note that the commutative law does not hold in S 3 , because yx ^ xy. 


2. SUBGROUPS 


One reason that the general linear group and the symmetric group are so important 
is that many other groups are contained in them as subgroups. A subset // of a group 
G is called a subgroup if it has the following properties: 

(2.1) (a) Closure ； If a E H and b E H y then ab E H, 

(b) Identity: 1 E H. 

(c) Inverses: If a E H, then a x E H. 

These conditions are explained as follows: The first condition (a) tells us that the law 
of composition on the group G can be used to define a law on H, called the induced 
law of composition. The second and third conditions (b ， c) say that // is a group with 
respect to this induced law. Notice that (2.1) mentions all parts of the definition of a 
group except for the associative law. We do not need to mention associativity. It car¬ 
ries over automatically from G to H. 
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Every group has two obvious subgroups: the whole group and the subgroup {1} 
consisting of the identity element alone _ A subgroup is said to be a proper subgroup 
if it is not one of these two. 

Here are two examples of subgroups: 

( 2 . 2 ) Examples* 

(a) The set T of invertible upper triangular 2x2 matrices 

(a,d ^ 0) 

is a subgroup of the general linear group GL 2 (1R)< 

(b) The set of complex numbers of absolute value 1 — the set of points on the 
unit circle in the complex plane — is a subgroup of C x . 

As a further example, we will determine the subgroups of the additive group 
Z + of integers. Let us denote the subset of Z consisting of all multiples of a given 
integer b by bZ: 

(2.3) bZ = {n S Z \ n = bk for some k ^ Z}. 

(2.4) Proposition. For any integer b, the subset bZ is a subgroup of Z + . More¬ 
over, every subgroup H of Z + is of the type H = bZ for some integer b. 

Proof. We leave the verification that bZ is a subgroup as an exercise and pro¬ 
ceed to show that every subgroup has this form. Let // be a subgroup of Z' Re¬ 
member that the law of composition on Z+ is addition, the identity element is 0, and 
the inverse of a is -a. So the axioms for a subgroup read 

(i) if a E H and b ^ H, then a + /^ E //; 

(ii) o e //; 

(iii) if a e H, then -a E H. 

By axiom (ii), 0 E //. If 0 is the only element of H, then H = 0Z, so that case is 
settled. If not, there is a positive integer in H. For let a G H be any nonzero ele¬ 
ment. If a is negative, then -a is positive，and axiom (iii) tells us that -a is in H t 
We choose for b the smallest positive integer in H, and we claim that H = bZ. We 
first show that bZ C H, in other words, that bk E H for every integer k. If A: is a 
positive integer, then bk = b + b + … + b (k terms). This element is in H by ax¬ 
iom (i) and induction. So is b(~k) = -bk, by axiom (iii). Finally, axiom (ii) tells us 
that bO = 0 E H. 

Next we show that H C bZ, that is，that every element n G // is an integer 
multiple of b. We use division with remainder to write n = bq + r ， where q, r are 
integers and where the remainder r is in the range 0 < r < b. Then n and bq are 
both in H ， and axioms (iii) and (i) show that r = n — bq is in H too. Now by our 
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choice, b is the smallest positive integer in H, while 0 < r < b. Therefore r = 0, 
and n = bq E 办 Z，as required. □ 

The elements of the subgroup bZ can be described as the integers which are 
divisible by b. This description leads to a striking application of proposition (2,3) to 
subgroups which are generated by two integers a, b. Let us assume that a and b are 
not both zero. The set 

(2.5) aZ + l?Z = {n E Z I n = ar + bs for some integers r, s} 

is a subgroup of Z+, It is called the subgroup generated by a and b, because it is the 
smallest subgroup which contains both of these elements. Proposition (2*3) tells us 
that this subgroup has the form dl. for some integer d, so it is the set of integers 
which are divisible by d. The generator d is called the greatest common divisor of a 
and 办 ， for reasons which are explained in the following proposition: 

(2.6) Proposition• Let a, b be integers, not both zero, and let d be the positive in¬ 
teger which generates the subgroup a~L + M. Then 

(a) d can be written in the form d = ar 七 bs for some integers r and s. 

(b) d divides a and b. 

(c) If an integer e divides a and Z?，it also divides d. 

Proof. The first assertion (a) just restates the fact that d is contained in 
aZ + bZ. Next, notice that a and b are in the subgroup dZ = aZ + Therefore 
d divides a and b. Finally, if e is an integer which divides a and b, then a and b are 
in elL This being so, any integer n = ar bs is also in elL By assumption, d has 
this form, so e divides d. a 

If two integers a, b are given，one way to find their greatest common divisor is 
to factor each of them into prime integers and then collect the common ones. Thus 
the greatest common divisor of 36 = 2-2-3-3 and 60 = 2^ 2 -3* 5 is 12 — 2 - 2 - 3. 
Properties (2.6ii,iii) are easy to verify. But without proposition (2*4), the fact that 
the integer determined by this method has the form or + bs would not be clear at 
all. (In our example, 12 = 36, 2 — 60 1.) We will discuss the applications of this 
fact to arithmetic in Chapter 11. 

We now come to an important abstract example of a subgroup, the cyclic sub¬ 
group generated by an arbitrary element x of a group G. We use multiplicative nota¬ 
tion. The cyclic subgroup H generated by x is the set of all powers of x: 

(2.7) H — {," ， jc— 2 , jT 1 ， 1 ， x ， jc 2 ，___}_ 

It is a subgroup of G — the smallest subgroup which contains x. But to interpret (2.7) 
correctly, we must remember that x n is a notation for a certain element of G. It may 
happen that there are repetitions in the list. For example, ifx = 1, then all elements 
in the list are equal to 1. We may distinguish two possibilities: Either the powers of 
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x are all distinct elements, or they are not. In the first case，the group H is called 
infinite cyclic• 

Suppose we have the second case, so that two powers are equal, say x n = x m , 
where n > m. Then x n ~ m = 1 [Cancellation Law (1_12)]，and so there is a nonzero 
power of x which is equal to 1. 

(2.8) Lemma. The set S of integers n such that x n = 1 is a subgroup of Z+, 

Proof. If x m = 1 and x n = 1, then x m+n = x m x n — 1 too. This shows that 
m 十 n E S if m,n E S. So axiom (i) for a subgroup is verified. Also，axiom (ii) 
holds because x° = 1. Finally ， if x n = l, then x~ n = x n x~ n = x° = L Thus 
-n E ： S if n E 5. □ 


It follows from Lemma (2.8) and Proposition (2.4) that 5 = mZ, where m is 
the smallest positive integer such that x m = 1. The m elements 1 ， jc ， . ， . ， x m_1 are all 
different. (If = x j with 0 引 < j < m ， then x j ~ l = 1. But j — i < m, so this 
is impossible.) Moreover, any power is equal to one of them: By division with re¬ 
mainder, we may write n = mq r with remainder r less than m. Then 
x n = {x m ) q x r = x r . Thus H consists of the following m elements: 

(2.9) H = {1, x ， … ， jc m_1 }，these powers are distinct ， and x m = l. 


Such a group is called a cyclic group of order m. 

The order of any group G is the number of its elements. We will often denote 
the order by 

(2.10) |G| = number of elements of G. 


Of course，the order may be infinite. 

An element of a group is said to have order m (possibly infinity) if the cyclic 
subgroup it generates has order m. This means that m is the smallest positive integer 
with the property = 1 or，if the order is infinite, that ^ 1 for all m 0, 


For example, the matrix 


1 0 


is an element of order 6 in GL 2 (U) y so the 


cyclic subgroup it generates has order 6, On the other hand，the matrix 


0 


has 


infinite order, because 



We may also speak of the subgroup of a group G generated by a subset U• This 
is the smallest subgroup of G containing U, and it consists of all elements of G 
which can be expressed as a product of a string of elements of U and of their in¬ 
verses. In particular，a subset t/ of G is said to generate G if every element of G is 
such a product. For example，we saw in (1.17) that the set U = {x^y} generates the 
symmetric group S 3 . Proposition (2.18) of Chapter 1 shows that the elementary ma¬ 
trices generate GL n . 



48 


Groups Chapter 2 


The Klein four group V is the simplest group which is not cyclic. It will appear 
in many forms* For instance，it can be realized as the group consisting of the four 
matrices 


( 2 . 11 ) 


±1 



Any two elements different from the identity generate V. 

The quaternion group H is another example of a small subgroup of GL 2 (C) 
which is not cyclic. It consists of the eight matrices 

(2.12) H = {±1 ， ±i ，土 j ， 土 k }， 


where 



0 


The two elements i，j generate H, and computation leads to the formulas 

(2.13) i 4 = 1, i 2 = j 2 , ji = i 3 j. 

These products determine the multiplication table of H. 


3, ISOMORPHISMS 


Let G and G r be two groups. We want to say that they are isomorphic if all proper¬ 
ties of the group structure of G hold for G f as well, and conversely. For example, let 
G be the set of real matrices of the form 

1 x 
1 • 

u. 」 

This is a subgroup of GL 2 (R)，and the product of two such matrices is 

_ ~| I - —§ ■— 一 

1 x 1 y _ 1 x + y 

1 1 1 _ 

The upper right entries of the matrices add when the matrices are multiplied，the rest 
of the matrix being fixed. So when computing with such matrices，we need to keep 
track of only the upper right entry. This fact is expressed formally by saying that the 
group G is isomorphic to the additive group of real numbers. 

How to make the concept of isomorphism precise will not be immediately 
clear, but it turns out that the right way is to relate two groups by a bijective corre¬ 
spondence between their elements 3 compatible with the laws of composition ， that is, a 
correspondence 

(3.1) G< ~ >G f 
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having this property: If a，b E G correspond to E G\ then the product ab in 

G corresponds to the product a f b f in G f . When this happens, all properties of the 
group structure carry over from one group to the other* 

For example，the identity elements in isomorphic groups G and G f correspond. 
To see this，say that the identity element 1 of G corresponds to an element e r in G f . 
Let a f be an arbitrary element of G' and let a be the corresponding element of G. 
By assumption，products correspond to products. Since la — a in G, it follows that 
e f a f = a f inG f .ln this way，one shows that e r — \ f . Another example: The or¬ 
ders of corresponding elements are equal. If a corresponds to a r in then，since 
the correspondence is compatible with multiplication ， a r = \ if and only if 
a tr — r. 

Since two isomorphic groups have the same properties, it is often convenient 
to identify them with each other when speaking informally. For example，the sym¬ 
metric group S n of permutations of is isomorphic to the group of permuta¬ 

tion matrices，a subgroup of GL n (U), and we often blur the distinction between 
these two groups. 

We usually write the correspondence (3.1) asymmetrically as a function，or 

map <p: G - > G f . Thus an isomorphism ip from G fo G' is a bijective map which is 

compatible with the laws of composition. If we write out what this compatibility 
means using function notation for ip, we get the condition 

(3.2) ip 、 ab) = (p(d)(p(b), for all a, b E G. 

The left side of this equality means to multiply a and b in G and then apply cp, while 
on the right the elements ip {a) and (p(b), which we denoted by a’ ， b’ before, are 
multiplied in G 7 . We could also write this condition as 

{ab)' = a f b'. 

Of course，the choice of G as domain for this isomorphism is arbitrary. The inverse 

function <p~ l : G f - >G would serve just as well. 

Two groups G and G f are called isomorphic if there exists an isomorphism 

<p ： G - >G f . We will sometimes indicate that two groups are isomorphic by the 

symbol — : 

(3.3) G ^ G f means G is isomorphic to G f . 

For example, let C = be an infinite cyclic group. 

Then the map 

(p: z + — >c 

defined by <p{n) = a n is an isomorphism. Since the notation is additive in the do¬ 
main and multiplicative in the range, condition (3,2) translates in this case to 
<p(m + «) = (p(m)(p{n), or 

a m+n = a m a n . 
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One more simple example: 

Let G = {l,x ， jc 2 , …， ) and CT = {l ， y,y 2 , … ， y n_1 } be two cyclic groups , gen¬ 
erated by elements x,y of the same order. Then the map which sends jc' to y is an 
isomorphism: Two cyclic groups of the same order are isomorphic. 

Recapitulating, two groups G and G f are isomorphic if there exists an isomor¬ 
phism <p: G - >G f f a bijective map compatible with the laws of composition. The 

groups isomorphic to a given group G form what is called the isomorphism class of 
G ， and any two groups in an isomorphism class are isomorphic. When one speaks of 
classifying groups, what is meant is to describe the isomorphism classes. This is too 
hard to do for all groups, but we will see later that there is, for example, one iso¬ 
morphism class of groups of order 3 [see (6,13)]，and that there are two classes of 
groups of order 4 and five classes of groups of order 12 [Chapter 6 (5.1)]. 

A confusing point about isomorphisms is that there exist isomorphisms from a 
group G to itself: 

<p\ G — >G. 

Such an isomorphism is called an automorphism of G. The identity map is an auto¬ 
morphism, of course, but there are nearly always other automorphisms as well. For 
example, let G = {1 ， jc ， x 2 } be a cyclic group of order 3, so that x 3 = 1. The trans¬ 
position which interchanges x and x 2 is an automorphism of G: 

1 1 

X /ww^> 2 

J^^vvvv^ X. 

This is because x 2 is another element of order 3 in the group. If we call this element 

y, the cyclic subgroup {l,y,y 2 } generated by y is the whole group G, because 

_ _ 

y = x. The automorphism compares the two realizations of G as a cyclic group. 

The most important example of automorphism is conjugation: Let b E G be a 
fixed element. Then conjugation by b is the map <p from G to itself defined by 

(3.4) (p(x) = bxb~ l . 

This is an automorphism because，first of all，it is compatible with multiplication in 
the group: 

<p(xy) = bxyb~ l = bxb~ ] byb~ [ = 

and ， secondly，it is a bijective map since it has an inverse function, namely conjuga¬ 
tion by b~\ If the group is abelian, then conjugation is the identity map: 
bab~ l = abb 1 = a. But any noncommutative group has some nontrivial conjuga¬ 
tions, and so it has nontrivial automorphisms. 

The element bab~ x is called the conjugate of a by b and will appear often. Two 
elements a, a f of sl group G are called conjugate if a f = bab 一 ' for some b E G. 
The conjugate behaves in much the same way as the element a itself; for example, it 
has the same order in the group. This follows from the fact that it is the image of a 
by an automorphism. 
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The conjugate has a useful，though trivial ， interpretation. Namely, if we de¬ 
note bab~ l by < 3 ’，then 

(3.5) ba = a f b. 

So we can think of conjugation by b as the change in a which results when one 
moves b from one side to the other. 

4. HOMOMORPHISMS 

Let G,G f be groups* A homomorphism (p: G - >G f is any map satisfying the rule 

(4.1) (p(ab) = ( 咖⑻， 

for all a, b E G. This is the same requirement as for an isomorphism [see (3.2)]. 
The difference is that cp is not assumed to be bijective here, 

(4.2) Examples. The following maps are homomorphisms: 

(a) the determinant function det: GL n (U) - > U x ; 

(b) the sign of a permutation sign: S n - >{±1} [see Chapter 1 (4.9)]; 

(c) the map <p: Z+ - >G defined by <p(n) = a n ，where a is a fixed element of G; 

(d) the inclusion map i: H - of a subgroup H into a group G, defined by 

/(jc) = x. 

(4.3) Proposition• A group homomorphism (p: G - >G f carries the identity to 

the identity，and inverses to inverses. In other words ， <p(lc) = 1 g' ，and 

Proof. Since 1 = 1_1 and since is a homomorphism ， 

cp(l) = 1) = <p(l)<p(l). Cancel <p(l) from both sides by (1.12): 1 = <p(l). 

Next ， cp(a~ l )(p(a) = ip(cf x a) = <p(l) = 1， and similarly (p{a)(p{a~ x ) = L Hence 

(p(a~ l ) = (p(a)~\ a 

Every group homomorphism <p determines two important subgroups: its image 

and its kernel. The image of a homomorphism (p: G - is easy to understand. It 

is the image of the map 

(4.4) im = {jc E ： G r \ x = cp(a) for some a E G}, 

and it is a subgroup of G' Another notation for the image is <p(G), In Examples 
(4.2a ， b)，the image is equal to the range of the map，but in example (4.2c) it is the 
cyclic subgroup of G generated by a, and in Example (4,2d) it is the subgroup H. 

The kernel of <p is more subtle. It is the set of elements of G which are mapped 
to the identity in G f : 

(4.5) ker cp = {a E G \ <p(a) = l}, 
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which can also be described as the inverse image of the identity element [see 

Appendix (1.5)]. The kernel is a subgroup of G，because if a and b are in ker 
then cp(ab) = (p(a)(p(b) =1*1 = 1, hence ab E ker (p, and so on. 

The kernel of the determinant homomorphism is the subgroup of matrices 
whose determinant is 1. This subgroup is called the special linear group and is de¬ 
noted by SL n (U): 

(4.6) 57^( 阿 ） ={real nX n matrices A | det A = 1 }， 

a subgroup of The kernel of the sign homomorphism in Example (4.2b) 

above is called the alternating group and is denoted by A n : 

(4.7) An = {even permutations }， 

a subgroup of S n . The kernel of the homomorphism (4_2d) is the set of integers n 
such that a n = That this is a subgroup of Z+ was proved before，in (2.8). 

In addition to being a subgroup, the kernel of a homomorphism has an extra 
property which is subtle but very important. Namely, if a is in ker <p and b is any 
element of the group G, then the conjugate bab~ l is in ker q>. For to say a E ker <p 
means <p(a) = Then 

<p(bab~ l ) = (p{b)(p{a)(p{b~ x ) = (p(b)\(p(b)~ l = 1, 
so bab~ l E ker q> too. 

(4.8) Definition• A subgroup of a group G is called a normal subgroup if it has 
the following property: For every a E ： N and every b E ： G, the conjugate bab~ x is 
in N. 

As we have just seen ， 

(4.9) The kernel of a homomorphism is a normal subgroup. 

Thus SL n (U) is a normal subgroup of GL n (U), and A n is a normal subgroup of S n . 

Any subgroup of an abelian group G is normal, because when G is abelian, 
bab~ l = a. But subgroups need not be normal in nonabelian groups. For example, 
group T of invertible upper triangular matrices is not a normal subgroup of GL 2 (tR). 

For let A = 1 j and B = ^ .Then BAB~ l = j Here A E T and 

B E GL 2 (U), but BAB 1 0 T. 

The center of a group G，sometimes denoted by Z or by Z(G), is the set of ele¬ 
ments which commute with every element of G: 

(4.10) Z — {z E G I zjc = jcz for all jc E 

The center of any group is a normal subgroup of the group. For example, it can be 
shown that the center of GL n (U) is the group of scalar matrices, that is, those of the 
form cl . 
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5. EQUIVALENCE RELATIONS AND PARTITIONS 

A fundamental mathematical construction is to start with a set S and to form a new 
set by equating certain elements of S according to a given rule. For instance, we 
may divide the set of integers into two classes, the even integers and the odd in¬ 
tegers . Or we may wish to view congruent triangles in the plane as equivalent geo¬ 
metric objects. This very general procedure arises in several ways, which we will 
discuss here. 

Let 5 be a set. By a partition P of S ， we mean a subdivision of S into nonover¬ 
lapping subsets: 

(5.1) S = union of disjoint，nonempty subsets. 

For example, the sets 

{1 ， 3} ， {2,5}, {4} 

form a partition of the set {1 ， 2, 3,4, 5}. The two sets，of even integers and of odd 
integers，form a partition of the set Z of all integers. 

An equivalence relation on 5 is a relation which holds between certain ele¬ 
ments of S. We often write it as a 〜 b and speak of it as equivalence of a and b. 

(5.2) An equivalence relation is required to be: 

(i) transitive: If a ^ b and b 〜 c, then a ~ c; 

(ii) symmetric: If a 〜 b ， then b 〜 a\ 

(iii) reflexive: a — a for all a E ： S. 

Congruence of triangles is an example of an equivalence relation on the set S of tri¬ 
angles in the plane. 

Formally，a relation on S is the same thing as a subset R of the set 5 x 5 of 
pairs of elements; namely, the subset R consists of pairs (a, b) such that a 〜 b. 
In terms of this subset，we can write the axioms for an equivalence relation as fol¬ 
lows: (i) if (a, b) E ： R and (b ， c) E R ，then (a,c) E /?; (ii) if (a, b) E R, then 
(b,a) E R; and (iii) (a, a) ^ R for all a. 

The notions of a partition of S and an equivalence relation on S are logically 
equivalent，though in practice one is often presented with just one of the two. Given 
a partition P on S, we can define an equivalence relation R by the rule a — b if a 
and b lie in the same subset of the partition. Axioms (5.2) are obviously satisfied. 
Conversely, given an equivalence relation R, we can define a partition P this way; 
The subset containing a is the set of all elements b such that a 〜 b. This subset is 
called the equivalence class of a y and S is partitioned into equivalence classes. 

Let us check that the equivalence classes partition the set S. Call C a the equiva¬ 
lence class of an element a E S. So C a consists of the elements b such that a ^ b: 

(5.3) C a = {b E S a — b}. 
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The reflexive axiom tells us that a E ： C a . Therefore the classes C a are nonempty, 
and since a can be any element, the classes cover S. The remaining property of a 
partition which must be verified is that equivalence classes do not overlap. It is easy 
to become confused here, because if a ~ b then by definition b E ： C a . But b ^ Cb 
too. Doesn't this show that C a and Cb overlap? We must remember that the symbol 
C a is our notation for a subset of S defined in a certain way. The partition consists of 
the subsets，not of the notations. It is true that C a and Cb have the element b in com¬ 
mon, but that is all right because these are two notations for the same set. We will 
show the following: 

(5.4) Suppose that C a and Cb have an element d in common. Then C a = Cb. 

Let us first show that if a 〜 b then = C^. To do so，let jc be an arbitrary 
element of Cb. Then b 〜 x. Since a 〜 b ， transitivity shows that a 〜 x ， hence that 
x E ： C a . Therefore Ct C C a . The opposite inclusion follows from interchanging the 
roles of a and b. To prove (5,4)，suppose that d is in C a and in Cb ，，then a 〜 d and 
b 〜 d. Then by what has been shown, C a — Cd = Cb, as required. □ 

Suppose that an equivalence relation or a partition is given on a set S. Then we 
may construct a new set S whose elements are the equivalence classes or the subsets 
making up the partition. To simplify notation，the equivalence class of a, or the sub¬ 
set of the partition containing a ? is often denoted by a. Thus a is an element of S. 
Notice that there is a natural surjective map 

S - >S, which sends 

(5.5) _ 

In our original example of the partition of 5 = Z，the set S contains the two ele¬ 
ments (Even), [Odd) ， where the symbol (Even) represents the set of even integers 

■ _ 

and (Odd) the set of odd integers. And 0 = 2 = 4 and so on. So we can denote the 
set (Even) by any one of these symbols. The map 

(5.6) Z - » {(Even), (Odd)} 

is the obvious one. 

There are two ways to think of this construction. We can imagine putting the 
elements of S into separate piles，one for each subset of the partition, and then re¬ 
garding the piles as the elements of a new set S. The map S - >S associates each 

element with its pile. Or we can think of changing what we mean by equality among 
elements of S, interpreting a ~ b to mean a = b in S. With this way of looking at 
it，the elements in the two sets S and S correspond, but in S more of them are equal 
to each other. It seems to me that this is the way we treat congruent triangles in 
school. The bar notation (5.5) is well suited to this intuitive picture. We can work 
with the same symbols as in S, but with bars over them to remind us of the new rule: 

(5.7) a = b means a 〜 b. 

This notation is often very convenient. 
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A disadvantage of the bar notation is that many symbols represent the same el¬ 
ement of S. Sometimes this disadvantage can be overcome by choosing once and for 
all a particular element, or a representative, in each equivalence class. For example ， 
it is customary to represent (Even) by 0 and (Odd) by 1: 

(5.8) {{Even ), (Odd)} = {0,1}. 

Though the pile picture is more immediate, the second way of viewing S is of¬ 
ten the better one, because operations on the piles are clumsy to visualize, whereas 
the bar notation is well suited to algebraic manipulation. 

Any map of sets <p: S - >T defines an equivalence relation on the domain S y 

namely the relation given by the rule a — b if <p (a) = <p(b). We will refer to this as 
the equivalence relation determined by the map. The corresponding partition is 
made up of the nonempty inverse images of the elements of T. By definition，the in¬ 
verse image of an element t E ： T is the subset of S consisting of all elements s such 
that <p(s) = t. It is denoted symbolically as 

(5.9) = {5 E 5 I (p(s) = t}. 

Thus is a subset of the domain S, determined by the element t E ： T. (This is 

symbolic notation. Please remember that <p~ l is usually not a function.) The inverse 
images may also be called the fibres of the map <p. The fibres which are 

nonempty, which means t is in the image of <p，form a partition of S. Here the set S 
of equivalence classes, which is the set of nonempty fibres, has another incarnation, 
as the image im <p of the map. Namely, there is a bijective map 

(5.10) Ip: S - >im cp, 

the map which sends an element s of S to cp ( 5 ). 

We now go back to group homomorphisms. Let (p: G - > G f be a homomor¬ 

phism, and let us analyze the equivalence relation on G which is associated to the 
map <p or ， equivalently, the fibres of the homomorphism. This relation is usually de¬ 
noted by =, rather than by 〜， and is referred to as congruence: 

(5.11) a 三 b if ip {a) = <p(b). 

For example, let <p: C x - > U x be the absolute value homomorphism defined 

by q> (a) = I a I • The induced equivalence relation is a 三 ft if | a | = \b\. The fibres 
of this map are the concentric circles about 0. They are in bijective correspondence 
with elements of im <p, the set of positive reals. 



(5.12) Figure. Fibres of the absolute value map C x - > U x . 
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The relation (5.11) can be rewritten in a number of ways，of which the follow¬ 
ing will be the most important for us: 

(5.13) Proposition. Let (p: G - > G' be a group homomorphism with kernel 

and let a, b be elements of G. Then <p{a) = <p (b) if and only if b = an for some ele¬ 
ment n E N, or equivalently, if a~ l b E N. 

Proof • Suppose that ip (a) = (p(b). Then ipia)~ x ip{b) = 1, and since is a ho¬ 
momorphism we can use (4.1) and (4.3) to rewrite this equality as <p{a~ l b) = 1. 
Now by definition, the kernel N is the set of all elements x E G such that (p(x) = l. 
Thus a" l b E N, or a~ l b = n for some n E N. Hence b = an, as required. Con¬ 
versely, if b = an and n E. N, then cp(b) = (p {a)<p (n) = cp (a) \ = ip (a). □ 

The set of elements of the form an is denoted by aN and is called a coset of N 

in G: 

(5.14) aN — {g E ： G \ g = an for some n E ： N}. 

So the coset aN is the set of all group elements b which are congruent to a. 
The congruence relation a 三 b partitions the group G into congruence classes ， the 
cosets aN. They are the fibres of the map <p. In particular, the circles about the 
origin depicted in (5.12) are cosets of the absolute value homomorphism. 



(5.15) Figure. A schematic diagram of a group homomorphism. 

An important case to look at is when the kernel is the trivial subgroup. In that 
case (5.13) reads as follows; 

(5.16) Corollary • A group homomorphism <p: G - >G f it injective if and only if 

its kernel is the trivial subgroup {1}. □ 

This gives us a way to verify that a homomorphism is an isomorphism. To do so, we 
check that ker (p = {1}，so that (p is injective，and also that im <p = G f , that is, that 
(p is surjective. 
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6. COSETS 

One can define cosets for any subgroup // of a group G, not only for the kernel of a 
homomorphism. A left coset is a subset of the form 

(6.1) aH = {ah | h G H}, 

Note that the subgroup H is itself a coset ? because H = 1//. 

The cosets are equivalence classes for the congruence relation 

(6.2) a = b If b = ah, for some /z E //. 

Let us verify that congruence is an equivalence relation. Transitivity: Suppose that 
a 三 b and b = c. This means that b = ah and c = bh f for some h,h f E H. There¬ 
fore c = ahh f . Since // is a subgroup, /z/z ; E //. Thus a = c. Symmetry: Suppose 
a = b, so that b — ah. Then a = bh x and h~ l E ： H, and so b ^ a. Reflexivity: 
a = al and 1 E so a = a. Note that we have made use of all the defining prop¬ 
erties of a subgroup. 

Since equivalence classes form a partition, we find the following: 

(6.3) Corollary. The left cosets of a subgroup partition the group. □ 

(6.4) Note. The notation aH defines a certain subset of G. As with any equiva¬ 
lence relation, different notations may represent the same subset. In fact，we know 
that aH is the unique coset containing a, and so 

(6.5) aH = bH if and only if a 三 b. 

The corollary just restates (5*4): 

(6.6) If aH and bH have an element in common, then they are equal. 

For example, let G be the symmetric group S 3 , with the presentation given in 
(1.18): G = {l,x,x 2 ,y,xy,x 2 y}. The element xy has order 2, and so it generates a 
cyclic subgroup H — {l,xy} of order 2* The left cosets of // in G are the three sets 

(6.7) {l ， ^y} — H = xyH, {x,x 2 y} = xH = x 2 yH, {x 2 y} = x 2 H = yH. 

Notice that they do partition the group. 

The number of left cosets of a subgroup is called the index of // in G and is 
denoted by 

(6.8) [G : HI 

Thus in our example the index is 3. Of course if G contains infinitely many ele¬ 
ments, the index may be infinite too. 

Note that there is a bijective map from the subgroup H to the coset aH ， send¬ 
ing (Why is this a bijective map?) Thus 
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(6.9) Each coset aH has the same number of elements as H does. 

Since G is the union of the cosets of H and since these cosets do not overlap ， 
we obtain the important Counting Formula 

(6.10) \G\ = \H\[G:Hl 

where IGI denotes the order of the group，as in (2.10), and where the equality has 
the obvious meaning if some terms are infinite. In our example (6.7), this formula 
reads 6 = 2*3. 

The fact that the two terms on the right side of equation (6.10) must divide the 
left side is very important. Here is one of these conclusions, stated formally: 

(6.11) Corollary* Lagrange，s Theorem: Let G be a finite group, and let // be a 
subgroup of G. The order of H divides the order of G. □ 

In Section 2 we defined the order of an element a E G to be the order of the 
cyclic subgroup generated by a. Hence Lagrange’s Theorem implies the following: 

(6.12) The order of an element divides the order of the group. 

This fact has a remarkable consequence: 

(6.13) Corollary* Suppose that a group G has p elements and that p is sl prime in¬ 
teger .Let a E G be any element, not the identity. Then G is the cyclic group 

generated by a. 

For，since a 关 1， the order of a is greater than 1， and it divides \ G\ = p. Hence it 
is equal to p. Since G has order p, is the whole group. □ 

Thus we have classified all groups of prime order p. They form one isomor¬ 
phism class，the class of a cyclic group of order p. 

The Counting Formula can also be applied when a homomorphism is given. 

Let (p: G - > G' be a homomorphism. As we saw in (5.13)，the left cosets of ker (p 

are the fibres of the map (p. They are in bijective correspondence with the elements 
in the image, 

(6.14) [G : ker (p] = | im (p\. 

Thus (6.10) implies the following: 

(6.15) Corollary. Let <p\ G - > G' be a homomorphism of finite groups* Then 

IGI = I ker I . | im |. 

Thus I ker <p | divides | G | ， and |im <p | divides both | G | and \ G'\. 

Proof • The formula is obtained by combining (6.10) and (6.14)，and it implies 
that I ker (p \ and | im <p | divide | G |. Since im is a subgroup of G\ |im <p| divides 
G ; I as well. □ 
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Let us go back for a moment to the definition of cosets ， We made the decision 
to work with left cosets aH. One can also define right cosets of a subgroup H 
and repeat the above discussion for them. The right cosets of a subgroup H are the 
sets 

(6.16) Ha = {ha \ h E H} y 

which are equivalence classes for the relation {right congruence) 

a = b ii b = ha ， for some h El H. 

Right cosets need not be the same as left cosets. For instance，the right cosets of the 
subgroup {1 ， xy} of S 3 are 

(6.17) {1, jcj} = H = Hxy ， {x,y} = Hx = Hy ， {x 2 ,x 2 y} = Hx 2 = Hx 2 y. 

This partition of S 3 is not the same as the partition (6.7) into left cosets * 

However, if is a normal subgroup，then right and left cosets agree. 

( 6 .18) Proposition. A subgroup // of a group G is normal if and only if every left 
coset is also a right coset. If H is normal, then aH = Ha for every a E ： G. 

Proof • Suppose that H is normal. For any h El H and any a E G, 

ah = (ahaT x )a. 

Since // is a normal subgroup, the conjugate element k = aha~ l is in//. Thus the el¬ 
ement ah = ka is in aH and also in Ha. This shows that aH C Ha. Similarly ， 
aH D Ha, and so these two cosets are equal. Conversely, suppose that H is not nor¬ 
mal. Then there are elements h E ： H and a E G so that aha~ { is not in H. Then ah 
is in the left coset aH but not in the right coset Ha. If it were, say ah = h f a for 
some h r G H ，then we would have aha 1 = h f G H ， contrary to our hypothesis. 
On the other hand, aH and Ha do have an element in common, namely the element 
a. So aH can’t be in some other right coset. This shows that the partition into left 
cosets is not the same as the partition into right cosets. □ 


7. RESTRICTION OF A HOMOMORPHISM TOA SUBGROUP 

The usual way to get an understanding of a complicated group is to study some less 
complicated subgroups. If it made sense to single out one method in group theory as 
the most important，this would be it. For example, the general linear group GL 2 is 
much more complicated than the group of invertible upper triangular matrices. We 
expect to answer any question about upper triangular matrices which comes up. And 
by taking products of upper and lower triangular matrices，we can cover most of the 
group GL 2 . Of course, the trick is to get back information about a group from an un¬ 
derstanding of its subgroups. We don’t have general rules about how this should be 
done. But whenever a new construction with groups is made, we should study its ef¬ 
fect on subgroups. This is what is meant by restriction to a subgroup • We will do 
this for subgroups and homomorphisms in this section. 
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Let // be a subgroup of a group G. Let us first consider the case that a second 
subgroup K is given. The restriction of K to H is the intersection K D H. The fol¬ 
lowing proposition is a simple exercise. 

(7.1) Proposition. The intersection K 0 H of two subgroups is a subgroup of H. 
If 尺 is a normal subgroup of G, then K Pi // is a normal subgroup of H. □ 

There is not very much more to be said here，but if G is a finite group, we may be 
able to apply the Counting Formula (6.10)，especially Lagrange's Theorem，to get 
information about the intersection. Namely, K Pi // is a subgroup of H and also a 
subgroup of K. So its order divides both of the orders | H \ and 丨尺 |. lf\H \ and \K \ 
have no common foctor，we can conclude that K 0 H = {1}. 

Now suppose that a homomorphism <p\ G - >G f is given and that // is a sub¬ 

group of G as before. Then we may restrict <p to H, obtaining a homomorphism 

(7.2) <p\ H : H — >G'. 

This means that we take the same map (p but restrict its domain to H. In other 
words, (p\ H {h) = (p{h) for all h E H. The restriction is a homomorphism because (p 
is one. 

The kernel of (p\ H is the intersection of ker <p with H : 

(7*3) ker (p\ H = (ker <p) Pi //• 

This is clear from the definition of kernel: (p{h) = \ if and only if /z E ker 

Again，the Counting Formula may help to describe this restriction. For，the 
image of <pL is According to Corollary (6.15), | <p{H) \ divides both | H \ and 

G f \. So if [ //1 and \G f \ have no common fee tor, (p (H) = {1}. Then we can con¬ 
clude that H C ker 

For example，the sign of a permutation is described by a homomorphism 

(4*2b), S n - > {±1}. The range of this homomorphism has order 2, and its kernel is 

the alternating group. If a subgroup H of S n has odd order, then the restriction of 
this homomorphism to H is trivial, which means that H is contained in the alternat¬ 
ing group，that is，// consists of even permutations. This will be so when H is the 
cyclic subgroup generated by a permutation p whose order in the group is odd. It fol¬ 
lows that every permutation of odd order is an even permutation. On the other hand, 
we can not make any conclusion about permutations of even order. They may be odd 
or even. 

When a homomorphism (p: G - >G f and a subgroup //^ of G ; are given，we 

may also restrict (p to H\ Here we must cut down the domain G of (p suitably，in 
order to get a map to H f . The natural thing to do is to cut down the domain as little 
as possible by taking the entire inverse image of//': 

(7.4) Proposition. Let <p: G - be a homomorphism, and let be a sub¬ 

group of G r . Denote the inverse image ip 一 l (H f ) = {x E G \ (p(x) E H r } by H. 
Then 
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(a) // is a subgroup of G. 

(b) If //' is a normal subgroup of then // is a normal subgroup of G. 

(c) H contains ker 

(d) The restriction of 9 to ^ defines a homomorphism H - ^ whose kernel is 

ker 

For example, consider the determinant homomorphism det: GL n (U) - > [R x . 

The set P of positive real numbers is a subgroup of 1R X ，and its inverse image is the 
set of invertible nX n matrices with positive determinant，which is a normal sub¬ 
group of GL n (U). 

Proof of Proposition (7.4). This proof is also a simple exercise，but we must 
keep in mind that 1 is not a map. By definition, H is the set of elements x E G 
such that (p(x) ^ H f . We verify the conditions for a subgroup. Identity: \ E ： H be¬ 
cause <p(l) = 1 E H\ Closure: Suppose that jc, y E H. This means that <p(jc) and 
<p(y) are in //’■ Since //' is a subgroup，<p (x)(p(y) E H\ Since <p is a homomor¬ 
phism, (p(x)(p(y) — <p(xy) E H\ Therefore xy G H. Inverses: Suppose jc E 点 ， so 
that <p(x) E then <p(jc ) _1 E H f because H r is a subgroup. Since is a homo¬ 
morphism, (p(jc )' 1 = (p(x~ l ). Thus jc ' 1 EH. 

Suppose that //' is a normal subgroup, and let jc E 5 and g E G. Then 
<p(s x S l ) = <p(g)<p(x)(p(g)~\ and (p(x) E W • Therefore <p(gxg~ l ) E H f 3 and this 
shows that gxg— 1 G H. Next ， i/ contains ker <p because ifx E ker (p then (p(x) = l, 
and 1 E H f • So x E (p~ l (H f ). The last assertion should be clear. □ 


& PRODUCTS OF GROUPS 


Let G,G f be two groups. The product set G x G’ can be made into a group by com¬ 
ponent-wise multiplication. That is, we define multiplication of pairs by the rule 

( 8 . 1 ) （ b, b , ) / ^^(ab ， a’b , )， 

for a,b ^ G and a f ,b f E G\ The pair (1, 1) is an identity, and 
{a,a f )~ x — (a~\a r ~ l ). The associative law in G x G ; follows from the fact that it 
holds in G and in G f . The group thus obtained is called the product of G and G f and 
is denoted by G x G ’ • Its order is the product of the orders of G and G f . 

The product group is related to the two factors G ? G ; in a simple way, which 
we can sum up in terms of some homomorphisms 


(B.2) 


G G 
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defined by 

i(x) = (x, 1), i^x 1 ) = (1 ，？)， 

p(x,x r ) = x, p f (x,x f ) = x f . 

The maps /, i ; are injective and may be used to identify G, G 7 with the subgroups 
G x 1 , 1 x G ; of GxG\ The maps p,p f are surjective, ker p = \ x G f , and 
ker p r = G x l. These maps are called the projections• Being kernels 3 G x 1 and 
1 x C are normal subgroups of G x G'. 

(8.3) Proposition• The mapping property of products: Let H be any group. The 

homomorphisms H - >GxG F are in bijective correspondence with pairs 

{(p,<p f ) of homomorphisms 

(p: H ― >G, (p f :H — >G\ 

The kernel of <I> is the intersection (ker (p) Pi (ker (p f ). 

Proof. Given a pair (<p ， <p’）of homomorphisms，we define the corresponding 
homomorphism 

<!>:// — >GxG f 

by the rule 4>(/z) = Op ⑻， <P f W)). This is easily seen to be a homomorphism. Con¬ 
versely, given 少 ， we obtain and f by composition with the projections, as 

<p — /?<!>， 〆 = 〆<!>• 

Obviously, 中⑻ =(1 ， 1) if and only if <p(h) = 1 and <p f (h) = 1, which shows that 
ker ❿ =(ker <p) Pi (ker 〆）.□ 

It is clearly desirable to compose a given group G as a product, meaning to find 
two groups H and H f such that G is isomorphic to the product H x H f . For the 
groups H,H’ will be smaller and therefore simpler，and the relation between 
Hx H f and its factors is easily understood. Unfortunately, it is quite rare that a 
given group is a product, but it does happen occasionally. 

For example, it is rather surprising that a cyclic group of order 6 can be de¬ 
composed: A cyclic group C 6 of order 6 is isomorphic to the product Ci x C3 of 
cyclic groups of orders 2 and 3. This can be shown using the mapping property just 
discussed. Say that C 6 = {1 ， x ， ;c 2 ， ." ， jc 5 } ， C 2 = {1 ,^} 5 C 3 = {l ? z,z 2 }. The rule 

(p\ Ce - ^ Ci X C3 

defined by (p{x l ) = (y\z l ) is a homomorphism, and its kernel is the set of elements 
x l such that y l = I and z l = L Now y l = 1 if and only if i is divisible by 2 , while 
z l = 1 if and only if i is divisible by 3. There is no integer between 1 and 5 which is 
divisible by both 2 and 3, Therefore ker = {1}，and (p is injective. Since both 
groups have order 6 , is bijective and hence is an isomorphism. □ 
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The same argument works for a cyclic group of order rs, whenever the two in¬ 
tegers r and s have no common factor. 

(8.4) Proposition• Let r,s be integers with no common factor. A cyclic group of 
order rs is isomorphic to the product of a cyclic group of order r and a cyclic group 
of order s. □ 

On the other hand，a cyclic group of order 4 is not isomorphic to a product of two 
cyclic groups of order 2. For it is easily seen that every element of Ci x Ci has order 
1 or 2, whereas a cyclic group of order 4 contains two elements of order 4* And, the 
proposition makes no assertions about a group which is not cyclic. 

Let A and B be subsets of a group G. Then we denote the set of products of 
elements of A and B by 

(8.5) AB = {x E G \ x = ab for some a E ： A and b E B}. 

The next proposition characterizes product groups. 

(8.6) Proposition* Let H and K be subgroups of a group G. 

(a) If H C\ K = {1}，the product map p: Hx K - >G defined by p(h,k) = hk is 

injective. Its image is the subset HK. 

(b) If either // or A ： is a normal subgroup of G，then the product sets HK and KH 
are equal，and HK is a subgroup of G. 

(c) If H and K are normal，// Pi 尺 ={1}，and // 欠 =G，then G is isomorphic to 
the product group H x K. 

Proof, (a) Let (h 2 ,k 2 ) be elements ofHxK such that h\k\ = hikz. 

Multiplying both sides of this equation on the left by h\~ x and on the right by ki~\ 
we find kik 2 ~ l = hi 一 1 hi. Since A Pi ^ = {1} ， k\k 2 ~ l = hi~ l hi = 1, hence hi = h% 
and k\ = k 2 * This shows that p is injective. 

(b) Suppose that // is a normal subgroup of G，and let h EH and k G K. Note that 
kh = (khk~ l )k. Since H is normal ， khk~ l E H. Therefore kh E HK, which shows 
that KH C HK. The proof of the other inclusion is similar. The feet that HK is a 
subgroup now follows easily. For closure under multiplication, note that in a product 
(hk)(h f k f ) = h(kh f )k f , the middle term kh f is in KH = HK, say kh f = h f k 9 \ Then 
hkh f k f = (hh fF )(k ff k f ) E HK. Closure under inverses is similar: (hky { = k~ l h~ l E 
KH = HK. And of course, 1 = 1* 1 E HK. Thus HK is a subgroup* The proof is 
similar in the case that K is normal. 

(c) Assume that both subgroups are normal and that H D K = {1}. Consider the 
product (hkh l )k^ 1 = h(kh 一 1 k 一 Since K is a normal subgroup，the left side is in K. 
Since H is normal，the right side is in H, Thus this product is the intersection 
H C\ K ， i_e_ ， hkh~ x k~ x = L Therefore hk = kh. This being known, the fact that 
p is a homomorphism follows directly: In the group Hx K, the product rule is 
(h\ 9 ki)(hi, k 2 ) = (hih 2 jkik 2 ) 9 and this element corresponds to h\h 2 kik 2 in G, while 
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in G the products h\k\ and h 2 ki multiply as h\k\h 2 k 2 ^ Since h 2 ki = kxh 2 , the products 
are equal. Part (a) shows that p is injective, and the assumption that HK = G shows 
that p is surjective. □ 

It is important to note that the product map p: H x K - >G will not be a 

group homomorphism unless the two subgroups commute with each other. 

9. MODULAR ARITHMETIC 

In this section we discuss Gauss’s definition of congruence of integers，which is one 
of the most important concepts in number theory. We work with a fixed, but arbi¬ 
trary, positive integer n throughout this section. 

Two integers a,b are said to be congruent modulo n ， written 

(9.1) a = b (modulo «)， 

if n divides b — a ， or if b 二 a + nk for some integer k. It is easy to check that this 
is an equivalence relation. So we may consider the equivalence classes，called con¬ 
gruence classes modulo n or residue classes modulo n, defined by this relation，as in 
Section 5, Let us denote the congruence class of an integer a by the symbol a. It is 
the set of integers 

(9*2) a = a — In, a — n y a, a + n, a + 

If a and b are integers, the equation a = b means that n divides b — a. 

The congruence class of 0 is the subgroup 

0 = nZ = {•■. ，一«， 0, n ， 

of the additive group Z + consisting of all multiples of n. The other congruence 
classes are the cosets of this subgroup. Unfortunately, we have a slight notational 
problem here, because the notation nZ is like the one we use for a coset• But nZ 
is not a coset; it is a subgroup of Z' The notation for a coset of a subgroup H 
analogous to (6.1), but using additive notation for the law of composition, is 

a + H = {a + h\ hE ： H}. 

In order to avoid writing a coset as a + nZ，let us denote the subgroup nZ by H. 
Then the cosets of H are the sets 

(9.3) a +H = {a + nk\kE ： Z}. 

They are the congruence classes a = a + H. 

The n integers 0 ， l,...，n — 1 form a natural set of representative elements for 
the congruence classes: 

(9.4) Proposition• There are n congruence classes modulo n，namely 

0 ， 1，".，w — 1* 

Or，the index [Z : nZ] of the subgroup nZ in Z is n. 
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Proof. Let a be an arbitrary integer. Then we may use division with remain¬ 
der to write 


a = nq + r, 

where q y r are integers and where the remainder r is in the range 0 < r < n. Then a 
is congruent to the remainder: a = r (modulo «). Thus a = r. This shows that a is 
one of the congruence classes listed in the proposition. On the other hand, if a and b 
are distinct integers less than n, say a ^ b, then b — a is less than n and different 
from zero, so n does not divide b — a. Thus a • b (modulo n) y which means that 
a ^ b. Therefore the n classes 0, 1，…， n — 1 are distinct. □ 

The main point about congruence classes is that addition and multiplication of 
integers preserve congruences modulo n f and therefore these laws can be used to 
define addition and multiplication of congruence classes. This is expressed by saying 
that the set of congruence classes forms a ring. We will study rings in Chapter 10. 

Let a and b be congruence classes represented by integers a and b. Their sum 
is defined to be the congruence class of fl + 办 ， and their product is defined to be the 
class of ab. In other words，we define 

(9.5) a + b = a + b and ab = ab. 

This definition needs some justification，because the same congruence class a can be 
represented by many different integers. Any integer a f congruent to a modulo n 
represents the same class. So it had better be true that if a f = a and b f = b, then 
a f b f ^ a + b and a r b f = ab. Fortunately，this is so. 

(9.6) Lemma. If a f ^ a and b f = b (modulo n) ? then a f + b f = a + b (modulo 
n) and a f b f = ab (modulo «). 

Proof• Assume that a f = a and b r = b, so that a f = a + nr and 
b f — b + ns for some integers r,s. Then a f + b f — a + b + n(r + s), which 
shows that a f + b f = a + b. Similarly, a f b f = (a + nr)(b + ns)= 
ab + n(as + rb + nrs), which shows that a f b f = ab, as required. □ 

The associative, commutative，and distributive laws hold for the laws of com¬ 
position (9.5) because they hold for addition and multiplication of integers. For ex¬ 
ample, the formal verification of the distributive law is as follows: 

^a(b + c) = a(b + c) — a(b c) (definition of + and X for congruence classes) 

=ab + ac {distributive law in the integers) 

=ab + ac = ab + ac (definition of + and X for congruence classes). 

The set of congruence classes modulo n is usually denoted by 

(9.7) Z/«Z. 

Computation of addition, subtraction, and multiplication in Z/nZ can be made ex- 
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plicitly by working with integers and taking remainders on division by n. That is 
what the formulas (9,5) mean. They tell us that the map 

(9.8) Z — >Z/«Z 

sending an integer a to its congruence class a is compatible with addition and multi- 
plication. Therefore computations can be made in the integers and then carried over 
to Z/«Z at the end. However, doing this is not efficient, because computations are 
simpler if the numbers are kept small. We can keep them small by computing the re¬ 
mainder after some part of a computation has been made. 

Thus if n = 13, so that 

Z/«Z = {0,1 ， 2”,” 12 }， 

then 

(7 + 9)(11 + 6) 

can be computed as 7 + 9 = 3 ， 11 + 6 = 4, 3 * 4 = 12. 

The bars over the numbers are a nuisance, so they are often left off* One just 
has to remember the following rule: 

(9.9) To say a — b in Z/«Z means a 三 b (modulo n). 

10. QUOTIENT GROUPS 

We saw in the last section that the congruence classes of integers modulo n are the 
cosets of the subgroup nZ of Z+. So addition of congruence classes gives us a law 
of composition on the set of these cosets. In this section we will show that a law of 
composition can be defined on the cosets of a normal subgroup N of any group G. 
We will show how to make the set of cosets into a group, called a quotient group• 
Addition of angles is a familiar example of the quotient construction. Every 
real number represents an angle, and two real numbers represent the same angle if 
they differ by an integer multiple of 2tt. This is very familiar. The point of the ex¬ 
ample is that addition of angles is defined in terms of addition of real numbers. The 
group of angles is a quotient group, in which G = 1R + and N is the subgroup of in¬ 
teger multiples of hr • 

We recall a notation introduced in Section 8: If A and B are subsets of a group 
G ， then 

AB = {ab \ a E A, b E. B}. 

We will call this the product of the two subsets of the group，though in other con¬ 
texts the term product may stand for the set Ax B, 

(10.1) Lemma ‘ Let be a normal subgroup of a group G. Then the product of 
two cosets aN, bN is again a coset, in fact 

(aN)(bN) = abN 、 
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Proof ， Note that Nb = bN, by (6.18), and since N is a subgroup NN = N. 
The following formal manipulation proves the lemma: 

(aN)(bN) = a(Nb)N = a(bN)N = abNN = abN. □ 

This lemma allows us to define multiplication of two cosets C\ y Ci by this rule: 
C\C 2 is the product set. To compute the product coset, take any elements a G Ci 
and b E C 2 , so that Ci = aN and C 2 = bN. Then CiC 2 = abN is the coset contain¬ 
ing ab. This is the way addition of congruence classes was defined in the last section. 
For example，consider the cosets of the unit circle N inG = C x . As we saw in 
Section 5, its cosets are the concentric circles 

C r = {z \z\ = r}. 

Formula (10.1) amounts to the assertion that if \a \ = r and |j81 = s, then 
I a/3 I = rs: 

C r Cs = Crs. 

The assumption that W is a normal subgroup of G is crucial to (10.1). If H 
is not a normal subgroup of G, then there will be left cosets Ci,C 2 of// in G whose 
products do not lie in a single left coset. For to say H is not normal means there are 
elements h E ： H and a E G so that aha 一 1 史 H. Then the set 

(10.2) (aH){a x H) 

does not lie in any left coset. It contains a la— 1 1 = 1， which is an element of H. So 
if the set (10.2) is contained in a coset, that coset must be// = IH. But it also con¬ 
tains aha~ l l, which is not in H. □ 

It is customary to denote the set of cosets of a normal subgroup N of G by the 
symbol 

(10.3) G/N = set of cosets of N in G. 

This agrees with the notation Z/nZ introduced in Section 9. Another notation we 
will frequently use for the set of cosets is the bar notation: 

G/N = G and aN = a 9 

so that a denotes the coset containing a. This is natural when we want to consider 
the map 

(10.4) 77 ： G - >G = G/N sending = aN. 

(10.5) Theorem. With the law of composition defined above, G = G/N is a 
group，and the map tt (10.4) is a homomorphism whose kernel is N. 

The order of G/ N is the index [G : N] of N in G. 

(10.6) Corollary. Every normal subgroup of a group G is the kernel of a homo¬ 
morphism. □ 
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This corollary allows us to apply everything that we know about homomorphisms to 
improve our understanding of normal subgroups. 

Proof of Theorem (10.5)* First note that tt is compatible with the laws of composi¬ 
tion: Since multiplication of cosets is defined by multiplication of elements, 
7r(d)7r(b) — 7r (ab)• Moreover，the elements of G having the samej.mage as the 
identity element 1 are those inN: 1 = IN = N. The group axioms in G follow from 
Lemma (10.7): 

(10.7) Lemma. Let G be a group, and let S be any set with a law of composition. 

Let (p: G - >5 be a surjective map which has the property (p(a)ip(b) = (p(ab) for 

all a, b in G. Then S is a group. 

Proof. Actually，any law concerning multiplication which holds in G will be 
carried over to 乂 The proof of the associative law is this: Let E S. Since <p 

is surjective, we know that & = 9 (a,) for some a, E G. Then 

(sis 2 )s 3 = (<p(ai)<p(< 22 )) 9 fe) = <p(a\d2}<p(a3) 二 ^{a\a 2 a?) 

= ip{ax)<p{a 2 a^) = <p(ai)(<p(a 2 )<p(< 3 3))= 心 ( 卽 3). 

We leave the other group axioms as an exercise. □ 



(10.8) Figure. A schematic diagram of coset multiplication. 

For example，let G = !R X be the multiplicative group of nonzero real numbers, 
and let P be the subgroup of positive real numbers. There are two cosets, namely P 
and -P = {negative reals}，and G = G/P is the group of two elements. The multi¬ 
plication rule is the familiar rule: (Neg)(Neg) = (Pos), and so on. 

The quotient group construction is related to a general homomorphism 
<p: G - > G ! of groups as follows: 

(10.9) Theorem• First Isomorphism Theorem: Let <p: G - be a surjective 

group homomorphism, and let N = ker Then G/N is isomorphic to by the 
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map Ip which sends the coset a = aN to <p(a): 

W) = <p(a). 

This is our fundamental method of identifying quotient groups. For example, the ab¬ 
solute value map C x - > 1R X maps the nonzero complex numbers to the positive 

real numbers，and its kernel is the unit circle U. So the quotient group C x /U is iso¬ 
morphic to the multiplicative group of positive real numbers. Or，the determinant is 

a surjective homomorphism GL n (U) - >U X , whose kernel is the special linear 

group SL n (U)^ So the quotient GL n (U)/SL n (U) is isomorphic to [R x . 

Proof of the First Isomorphism Theorem. According to Proposition (5,13)，the 
nonempty fibres of are the cosets aN. So we can think of G in either way，as the 
set of cosets or as the set of nonempty fibres of Therefore the map we are looking 
for is the one defined in (5.10) for any map of sets. It maps G bijectively onto the 
image of which is equal to G' because cp is surjective. By construction it is com- 
patible with multiplication: lp{ab) = <p{ab) = (p(a)cp(b) = 7p(a)lp(b) t □ 

^ gicbt a((o fc^r Die! Dcrfdjiebcnc 2lrtcn Don ©rij^en, 

melc^e fid) nid)t m\)i ^erje^Ien la^en ； 
unt) t)at)er entfleben Me t^tik ber ^atljcmatic, 

Dcrcn cine jcglidje mit cincr bcfon&crn 2lrt oort 0ro^en befdjaftiget iff. 

Leonhard Euler 

EXERCISES 

h The Definition of a Group 

L (a) Verify (1.17) and (1.18) by explicit computation. 

(b) Make a multiplication table for 

2. (a) Prove that GL n (U) is a group. 

(b) Prove that S n is a group. 

3. Let 5 be a set with an associative law of composition and with an identity element. 
Prove that the subset of S consisting of invertible elements is a group. 

4. Solve for y, given that xyz~ l w = 1 in a group. 

5. Assume that the equation xyz = 1 holds in a group G. Does it follow that yzx - 1? That 

yxz 二 1? 

6* Write out all ways in which one can form a product of four elements a, b, c, d in the 
given order* 

7. Let S be any set. Prove that the law of composition defined by ab = a is associative. 

8. Give an example of 2 x 2 matrices such that A X B BA~\ 

9. Show that \i ab = a \n di group, then b = and if ^ = 1, then b — a 1 . 

10 . Let a, b be elements of a group G. Show that the equation ax = b has a unique solution 
in G. 

11 . Let G be a group, with multiplicative notation. We define an opposite group G° with law 
of composition a 0 as follows: The underlying set is the same as G, but the law of com¬ 
position is the opposite; that is，we define a ° b — ba. Prove that this defines a group. 
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2. Subgroups 


1. Determine the elements of the cyclic group generated by the matrix 


-1 0 


explicitly. 


2. Let a, b be elements of a group G. Assume that a has order 5 and that a 3 b = ba 3 . Prove 
that ab — ba. 


3* Which of the following are subgroups? 

(a) GL n (U) C GL n {C). 

( b ) {1，—1} 匚盹 x . 

(c) The set of positive integers in Z + . 


( d ) The set of positive reals in 

(e) The set of all matrices ^ 


U x . 

^ , with a 竽 0, in GL 2 (U). 


4. Prove that a nonempty subset H of a group G is a subgroup if for all x,y E H the ele¬ 
ment jcy -1 is also in H. 

5. An nth root of unity is a complex number z such that z n = L Prove that the nth roots of 
unity form a cyclic subgroup of C x of order n. 

6. (a) Find generators and relations analogous to (2.13) for the Klein four group. 

(b) Find all subgroups of the Klein four group. 

7. Let a and b be integers. 

(a) Prove that the subset aZ + bl. is a subgroup of Z+. 

(b) Prove that a and b + 7a generate the subgroup aZ + bl. 

8. Make a multiplication table for the quaternion group //. 

9. Let H be the subgroup generated by two elements a,b of a group G. Prove that if 
ab = ba ， then H is an abelian group, 

10* (a) Assume that an element jc of a group has order rs. Find the order of x r t 
(b) Assuming that x has arbitrary order n, what is the order of x r l 

11. Prove that in any group the orders of ab and of ba are equal, 

12. Describe all groups G which contain no proper subgroup. 

13. Prove that every subgroup of a cyclic group is cyclic. 

14. Let G be a cyclic group of order n, and let r be an integer dividing n. Prove that G con¬ 
tains exactly one subgroup of order r. 

15. (a) In the definition of subgroup, the identity element in H is required to be the identity 

of G. One might require only that H have an identity element, not that it is the same 
as the identity in G. Show that if H has an identity at all, then it is the identity in G, 
so this definition would be equivalent to the one given. 

(b) Show the analogous thing for inverses. 

16. (a) Let G be a cyclic group of order 6. How many of its elements generate G? 

(b) Answer the same question for cyclic groups of order 5,8, and 10. 

(c) How many elements of a cyclic group of order n are generators for that group? 

17. Prove that a group in which every element except the identity has order 2 is abelian, 

18* According to Chapter 1 (2.18), the elementary matrices generate GL^([R). 

(a) Prove that the elementary matrices of the first and third types suffice to generate this 
group. 

(b) The special linear group 5L W ([R) is the set of real nX n matrices whose determinant 
is 1. Show that SL n (U) is a subgroup of GL rt ([R). 
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*(c) Use row reduction to prove that the elementary matrices of the first type generate 
5L„([R). Do the 2 x 2 case first. 

19. Determine the number of elements of order 2 in the symmetric group S 4 . 

20* (a) Let a y b be elements of an abelian group of orders m, n respectively. What can you 
say about the order of their product abl 

*(b) Show by example that the product of elements of finite order in a nonabelian group 
need not have finite order. 

21. Prove that the set of elements of finite order in an abelian group is a subgroup. 

22. Prove that the greatest common divisor of a and b, as defined in the text，can be obtained 
by factoring a and b into primes and collecting the common factors. 


3. Isomorphisms 


1. Prove that the additive group of real numbers is isomorphic to the multiplicative 
group P of positive reals. 

2. Prove that the products ab and ba are conjugate elements in a group. 

3. Let a, b be elements of a group G, and let a ; ™ bab~\ Prove that a = a' if and only if a 
and b commute. 


4. (a) Let b f = aba x . Prove that b fn 二 abW 
(b) Prove that if aba—' = b 2 , then a 3 bcf 3 = b\ 

5. Let <p: G - >G f be an isomorphism of groups. Prove that the inverse function is 

also an isomorphism. 

6. Let <p: G - be an isomorphism of groups, let x,y E G, and \tt x f = <p(x) and 

〆 = <p(y)- 

(a) Prove that the orders of jc and of jc r are equal. 

(b) Prove that if xyx = yxy, then x f y r x f = y'x'y'. 

(c) Prove that ^(jc' 1 ) = x f ~ l . 


7. Prove that the matrices 


are conjugate elements in the group GL 2 (IR) but 


—i K— —■ 

that they are not conjugate when regarded as elements of SL 2 (U) t 


8. Prove that the matrices 

9. Find an isomorphism from a group G to its opposite group G° (Section 2, exercise 12)* 

10. Prove that the map is an automorphism of GL n ([R). 

1L Prove that the set Aut G of automorphisms of a group G forms a group，the law of com¬ 
position being composition of functions. 

12. Let G be a group, and let 少 ： G - >G be the map <p{x) = jc~ 1 . 

(a) Prove that <p is bijective. 

(b) Prove that <p is an automorphism if and only if G is abelian, 

13. (a) Let G be a group of order 4. Prove that every element of G has order 1 ， 2, or 《 

(b) Classify groups of order 4 by considering the following two cases; 

(i) G contains an element of order 4. 

(ii) Every element of G has order < 4. 

14. Determine the group of automorphisms of the following groups. 

(a) Z +， (b) a cyclic group of order 10, (c) S 3 . 


「 n 

1 


"l 3 

2 

_ ^ 


2 


are conjugate in GL 2 (iR) 
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15. Show that the functions/ = l/x, g = {x - \)/x generate a group of functions, the law 
of composition being composition of functions, which is isomorphic to the symmetric 
group S 3 . 

16. Give an example of two isomorphic groups such that there is more than one isomorphism 
between them. 

4. Homomorphisms 

1. Let G be a group, with law of composition written x 餐 y. Let // be a group with law of 

composition u ° v. What is the condition for a map <p\ G - >H f to be a homomor¬ 

phism? 

2. Let <p ： G - >G f be a group homomorphism. Prove that for any elements of 

G, (p{a x ••• ak) = <p(A) … <p(ak )， 

3. Prove that the kernel and image of a homomorphism are subgroups. 

4. Describe all homomorphisms <p: 2 +< ->/+， and determine which are injective, which 

are surjective, and which are isomorphisms. 

5. Let G be an abelian group* Prove that the nth power map <p: G - > G defined by 

<p(x) 二 jc 71 is a homomorphism from G to itself. 

6. Let/: K+ - >C X be the map f(x) = Prove that/is a homomorphism，and deter¬ 

mine its kernel and image. ， 

7. Prove that the absolute value map | |: C x - > R x sending a^^\a \ is a homomor¬ 

phism, and determine its kernel and image. 

8. (a) Find all subgroups of 5 3 , and determine which are normal. 

(b) Find all subgroups of the quaternion group, and determine which are normal. 

9. (a) Prove that the composition 。屮 of two homomorphisms (p, if/is 3. homomorphism, 
(b) Describe the kernel of <p ° if/. 

10. Let <p: G - >G' be a group homomorphism. Prove that <pU) 二少 （ y) if and only if 

xy~ l G ker <p. 

11. Let G, H be cyclic groups, generated by elements jc ， y. Determine the condition on the 

orders m,n of x and y so that the map sending is a group homomorphism. 

. A B . 

12. Prove that the nX n matrices M which have the block form ^ ^ with A E GL r ([R) 

and D G GL„- r ([R) form a subgroup P of GL rt ([R), and that the map P - > GL r ([R) send¬ 
ing is a homomorphism. What is its kernel? 

13. (a) Let // be a subgroup of G, and let g E G. The conjugate subgroup gHg~ x is defined 

to be the set of all conjugates ghg ~ 1 , where /z E //. Prove that gHg~ x is a subgroup of 
G. 

(b) Prove that a subgroup // of a group G is normal if and only if gHg~ x = H for all 

gee. 

14. Let TV be a normal subgroup of G, and let ^ G G, ^ G N. Prove that g~ x ng E N. 

15. Let (p and if/ be two homomorphisms from a group G to another group G ; , and let 

H G G be the subset {x E, G \ <p(x) — Prove or disprove: // is a subgroup of G. 

16. Let (p: G - > G ; be a group homomorphism, and let ^ G G be an element of order r. 

What can you say about the order of <p(xY! 

17. Prove that the center of a group is a normal subgroup. 
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18, Prove that the center of GZ^([R) is the subgroup Z = {cl \ c E U, c ^ 0}. 

19, Prove that if a group contains exactly one element of order 2, then that element is in the 
center of the group. 

20, Consider the set U of real 3x3 matrices of the form 

1 * * 

1 * . 

1 

— ■ 

(a) Prove that [/ is a subgroup of SL n (U)* 

(b) Prove or disprove: U is normal. 

*(c) Determine the center of U. 

21, Prove by giving an explicit example that GL 2 (U) is not a normal subgroup of GL 2 (C), 

22, Let <p: G - >G f be a surjective homomorphism. 

(a) Assume that G is cyclic. Prove that G f is cyclic. 

(b) Assume that G is abelian. Prove that G f is abelian. 

23, Let (p: G - > G' be a surjective homomorphism，and let N be a normal subgroup of G, 

Prove that <p(N) is a normal subgroup of G\ 

5. Equimience Relations and Partitions 

1. Prove that the nonempty fibres of a map form a partition of the domain* 

2. Let 5 be a set of groups. Prove that the relation G ~ H if G is isomorphic to H is an 
equivalence relation on S. 

3. Determine the number of equivalence relations on a set of five elements, 

4. Is the intersection R D R f of two equivalence relations R ， R f 匚 S x S an equivalence re¬ 
lation? Is the union? 

5* Let // be a subgroup of a group^ G. Prove that the relation defined by the rule a ~ b if 
b~ l a G // is an equivalence relation on 

6. (a) Prove that the relation jc conjugate to y in a group G is an equivalence relation on G. 
(b) Describe the elements a whose conjugacy class (= equivalence class) consists of the 

element a alone. 

7. Let /? be a relation on the set U of real numbers. We may view i? as a subset of the (x 9 y)~ 
plane. Explain the geometric meaning of the reflexive and symmetric properties* 

8. With each of the following subsets R of the (jc, y)-plane, determine which of the axioms 
(5.2) are satisfied and whether or not R is an equivalence relation on the set U of real 
numbers. 

(a) R = {(^) I ^ E (R}. 

(b) R = empty set. 

(c) R = locus {y = 0}. 

(d) R = locus (xy + 1 = 0}. 

(e) R — locus {x 2 y - jcy 2 - x y — 0}^ 

(f) R = locus {x 2 - xy + 2x — 2y = 0}. 

9. Describe the smallest equivalence relation on the set of real numbers which contains the 
line x ^ y — 1 in the (x, y)-plane, and sketch it. 

10. Draw the fibres of the map from the (jc,z)- plane to the y-axis defined by the map y - zx. 
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11. Work out rules，obtained from the rules on the integers, for addition and multiplication 
on the set (5.8). 

12. Prove that the cosets (5.14) are the fibres of the map <p. 

6^ Cosets 

1. Determine the index [Z : nX]. 

2. Prove directly that distinct cosets do not overlap. 

3. Prove that every group whose order is a power of a prime p contains an element of order 

4. Give an example showing that left cosets and right cosets of GL 2 (U) in GL 2 (C) are not 
always equal. 

5. Let H, K be subgroups of a group G of orders 3, 5 respectively. Prove that 

// n 欠 = 

6. Justify (6A 5) carefully, 

7. (a) Let G be an abelian group of odd order. Prove that the map <p: G - > G defined by 

— x 2 is an automorphism. 

(b) Generalize the result of (a). 

8* Let W be the additive subgroup of U m of solutions of a system of homogeneous linear 
equations AX = 0. Show that the solutions of an inhomogeneous system AX = B form a 
coset of W, 

9. Let // be a subgroup of a group G. Prove that the number of left cosets is equal to the 
number of right cosets (a) if G is finite and (b) in general 

10. (a) Prove that every subgroup of index 2 is normal. 

(b) Give an example of a subgroup of index 3 which is not normal. 

11. Classify groups of order 6 by analyzing the following three cases _ 

(a) G contains an element of order 6. 

(b) G contains an element of order 3 but none of order 6. 

(c) All elements of G have order 1 or 2, 

12. Let G, H be the following subgroups of GL 2 {U)\ 



An element of G can be represented by a point in the (jc, y)-plane. Draw the partitions of 
the plane into left and into right cosets of H. 

1. Restriction of a Homomorphism to a Subgroup 

1. Let G and G f be finite groups whose orders have no common factor. Prove that the only 

homomorphism <p ： G - >G r is the trivial one (p(x) — 1 for all x. 

2. Give an example of a permutation of even order which is odd and an example of one 
which is even. 

3. (a) Let H and K be subgroups of a group G. Prove that the intersection xH H yK of two 

cosets of H and K is either empty or else is a coset of the subgroup H C\ K. 

(b) Prove that if H and K have finite index in G then H C\ K also has finite index. 
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4. Prove Proposition (7.1). 

5. LctH, Nbc subgroups of a group G，with iV normal. Prove that HN = NH and that this 
set is a subgroup. 

6* Let <p: G - >G f be a group homomorphism with kernel K, and let H be another sub¬ 
group of G. Describe in terms of H and K. 

7* Prove that a group of order 30 can have at most 7 subgroups of order 5. 

*8, Prove the Correspondence Theorem: Let <p: G - be a surjective group homomor¬ 

phism with kernel N. The set of subgroups // r of G' is in bijective correspondence with 
the set of subgroups H of G which contain N, the correspondence being defined by the 
maps <p{H) and <p~ l (H f ) . Moreover，normal subgroups of G correspond 

to normal subgroups of 

9. Let G and G f be cyclic groups of orders 12 and 6 generated by elements x,y re¬ 
spectively, and let 少 ： G - >G f bt the map defined by <p(x l ) = y\ Exhibit the corre¬ 

spondence referred to the previous problem explicitly. 

8. Products of Groups 

1. Let G，G ’ be groups ‘ What is the order of the product group G X G'? 

2* Is the symmetric group S 3 a direct product of nontrivial groups? 

3. Prove that a finite cyclic group of order rs is isomorphic to the product of cyclic groups 
of orders r and s if and only if r and 5 have no common factor. 

4. In each of the following cases, determine whether or not G is isomorphic to the product 
of H and K. 

(a) G = [R x ? // = {± 1} ? K = {positive real numbers}. 

(b) G - {invertible upper triangular 2x2 matrices}, H = {invertible diagonal ma¬ 
trices}, K = {upper triangular matrices with diagonal entries 1}. 

(c) G = C x and H = {unit circle}, K =f {positive reals}, 

5. Prove that the product of two infinite cyclic groups is not infinite cyclic. 

6. Prove that the center of the product of two groups is the product of their centers. 

7. (a) Let be subgroups of a group G. Show that the set of products 

HK = {hk I /z G //, /: E /T} is a subgroup if and only if HK = KH* 

(b) Give an example of a group G and two subgroups H，K such that HK is not a sub¬ 
group. 

8. Let G be a group containing normal subgroups of orders 3 and 5 respectively. Prove that 
G contains an element of order 15. 

9. Let G be a finite group whose order is a product of two integers: n = ab. Let H 3 K be 
subgroups of G of orders a and b respectively. Assume that H C\ K = {1}_ Prove that 
HK = G As G isomorphic to the product group H X K? 

10. Let jc E G have order m, and let y E G / have order n. What is the order of (x,y) in 
GxG^? 

11. Let // be a subgroup of a group G, and let <p: G - > H be a homomorphism whose re¬ 

striction to H is the identity map: <p(h) = h, if h E //. Let TV = ker 

(a) Prove that if G is abelian then it is isomorphic to the product group Hx N. 

(b) Find a bijective map G - >H x N without the assumption that G is abelian，but 

show by an example that G need not be isomorphic to the product group. 
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9. Modular Arithmetic 

1. Compute (7+14)(3 — 16) modulo 17, 

2* (a) Prove that the square a 2 of an integer a is congruent to 0 or 1 modulo 4. 

(b) What are the possible values of a 2 modulo 8? 

3. (a) Prove that 2 has no inverse modulo 6. 

(b) Determine all integers n such that 2 has an inverse modulo n. 

4. Prove that every integer a is congruent to the sum of its decimal digits modulo 9. 

5. Solve the congruence 2x = 5 (a) modulo 9 and (b) modulo 6. 

6. Determine the integers n for which the congruences x + y = 2, 2x — 3y 三 3 (modulo 
n) have a solution, 

7. Prove the associative and commutative laws for multiplication in Z/«Z. 

8. Use Proposition (2,6) to prove the Chinese Remainder Theorem: Let m ， n ， a，b be in¬ 
tegers ,and assume that the greatest common divisor of m and n is 1. Then there is an 
integer ^ such that x = a (modulo m) and x = b (modulo n ). 

10. Quotient Groups 

1. Let G be the group of invertible real upper triangular 2x2 matrices. Determine whether 
or not the following conditions describe normal subgroups H of G. If they do, use the 
First Isomorphism Theorem to identify the quotient group G/H. 

(a) a u ^ I (b) a ]2 = 0 (c) ^ a 22 (d) a n = a 22 ^ 1 

2. Write out the proof of (10.1) in terms of elements. 

3. Let P be a partition of a group G with the property that for any pair of elements A, B of 
the partition, the product set AB is contained entirely within another element C of the 
partition. Let N be the element of P which contains 1. Prove that AHs a normal subgroup 
of G and that P is the set of its cosets. 

4. (a) Consider the presentation (1.17) of the symmetric group 5 3 . Let H be the subgroup 

{l ， y}. Compute the product sets (lH)(xH) and (lH)(x 2 H), and verify that they are 
not cosets. 

(b) Show that a cyclic group of order 6 has two generators satisfying the rules x 3 = 1 ， 

: y 2 = l,yx = xy. 

(c) Repeat the computation of (a), replacing the relations (1.18) by the relations given in 
part (b). Explain. 

5. Identify the quotient group [R x /P, where P denotes the subgroup of positive real num¬ 
bers. 

6. Let H = { 土 1 ， 土 /} be the subgroup of G = C x of fourth roots of unity. Describe the 
cosets of H \nG explicitly, and prove that G/H is isomorphic to G. 

7. Find all normal subgroups N of the quaternion group //, and identify the quotients H/N. 

8* Prove that the subset H of G — GL n (U) of matrices whose determinant is positive forms 
a normal subgroup, and describe the quotient group G/H. 

9. Prove that the subset G x 1 of the product group G x G' is a normal subgroup isomor¬ 
phic to G and that (G x G f )/(G x 1) is isomorphic to G f . 

10. Describe the quotient groups C x /P and C x /U, where U is the subgroup of complex 
numbers of absolute value 1 and P denotes the positive reals, 

11. Prove that the groups 1R+/Z+ and R + /27rZ + are isomorphic. 
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Miscellaneous Problems 

1 . What is the product of all mth roots of unity in C? 

2* Compute the group of automorphisms of the quaternion group. 

3. Prove that a group of even order contains an element of order 2 . 

4. Let K C H C G be subgroups of a finite group G. Prove the formula 
[G : K] = [G : H][H : Kl 

*5. A semigroup 5 is a set with an associative law of composition and with an identity. But 
elements are not required to have inverses, so the cancellation law need not hold. The 
semigroup S is said to be generated by an element s if the set {l ， u 2 ，.，,} of nonnegative 
powers of 5 is the whole set S. For example, the relations s 2 = 1 and s 1 = s describe two 
different semigroup structures on the set {1,4 - Define isomorphism of semigroups, and 
describe all isomorphism classes of semigroups having a generator. 

6* Let be a semigroup with finitely many elements which satisfies the Cancellation Law 
(1.12). Prove that 5 is a group. 

*7. Let a = ( 山， … ，似 ） and b = ( 幻， …， &) be points in dimensional space R' A path 
from a to b is sl continuous function on the interval [0,1] with values in R k , that is，a 

function/: [0,1] - >U k , sending such that/(0) - a and 

/(l) = b. If is a subset of U k and if a,b E 5, we define a ~ b if a and b can be joined 
by a path lying entirely in S. 

(a) Show that this is an equivalence relation on S. Be careful to check that the paths you 
construct stay within the set S. 

(b) A subset S of U k is called path connected if a ~ b for any two points a,b G 5. 
Show that every subset S is partitioned into path-connected subsets with the property 
that two points in different subsets can not be connected by a path in S. 

(c) Which of the following loci in U 2 are path-connected? {x 2 + y 2 = 1}, {xy = 0}, 

Uy = 1 }. 

*8. The set of nXn matrices can be identified with the space U nXn . Let G be a subgroup of 
GL n (U). Prove each of the following. 

(a) If C,D EG, and if there are paths in G from A to 5 and from C to D, then there 
is a path in G from AC to BD. 

(b) The set of matrices which can be joined to the identity I forms a normal subgroup of 
G (called the connected component of G). 

*9. (a) Using the fact that 5L rt ([R) is generated by elementary matrices of the first type (see 
exercise 18, Section 2)，prove that this group is path-connected. 

(b) Show that GL n (U) is a union of two path-connected subsets, and describe them. 

10. Let H 9 K be subgroups of a group G, and let g G G. The set 

HgK = {x E G I jc = hgk for some h E H，k G K} 

is called a double coset . 

(a) Prove that the double cosets partition G. 

(b) Do all double cosets have the same order? 

11 . Let H be a subgroup of a group G. Show that the double cosets HgH are the left cosets 
gH if H is normal, but that if H is not normal then there is a double coset which properly 
contains a left coset. 

*12. Prove that the double cosets in GL„(!R) of the subgroups H = {lower triangular matrices} 
and K = {upper triangular matrices} are the sets HPK ， where P is a permutation matrix. 
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L REAL VECTOR SPACES 


The basic models for vector spaces are the spaces of n-dimensional row or column 
vectors: 


U n : the set of row vectors v = (<3 】，，， . ，办)， or 


-fli 

4 

the set of column vectors v = • 

* 


Though row vectors take less space to write，the definition of matrix multiplication 
makes column vectors more convenient for us. So we will work with column vec¬ 
tors most of the time. To save space, we will occasionally write a column vector in 
the form (fli ，.••，％)、 

For the present we will study only two operations: 


( 1 . 1 ) 


vector addition: 


scalar multiplication: 


ai 

• 

■ 

_|_ 

■ 

bi 

* 


— ■ 
a\+b x 
* 

■ 

* 


« 

• b n 


蠡 

_a n +b n _ 


,and 


a{\ \ca x 


c \ : 


a n \ ca n 
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These operations make U n into a vector space. Before going to the formal definition 
of a vector space, let us look at some other examples — nonempty subsets of R n 
closed under the operations (LI). Such a subset is called a subspace. 

(L2) Example. The subspaces W of the space U 2 are of three types: 

(i) the zero vector alone: W = {0}; 

(ii) the vectors lying on a line L through the origin; 

(iii) the whole space: W = U 2 . 

This can be seen from the parallelogram law for addition of vectors. If W contains 
two vectors wu ^2 not lying on one line, then every vector v can be obtained from 
these two vectors as a “linear combination” 

C \ W \ + C2W2, 

where Ci ， C 2 are scalars. So VK = K 2 in this case. If W does not contain two such 
vectors, then we are in one of the remaining cases. □ 

O vvi + C2W2 



Similarly, it can be shown that the subspaces of IR 3 are of four types: 

(i) the zero vector; 

(ii) the vectors lying on a line through the origin; 

(iii) the vectors lying in a plane through the origin; 

(iv) the whole space R 3 . 

This classification of subspaces of M 2 and R 3 will be clarified in Section 4 by the 
concept of dimension . 

Systems of homogeneous linear equations furnish many examples. The set of 
solutions of such a system is always a subspace. For, if we write the system in matrix 
notation as AY = 0 ? where A is an mx n matrix and X is a column vector, then it is 
clear that 

(a) AY = 0 and AY = 0 imply A(x + 7) = 0. In other words, if X and Y are solu¬ 
tions, so is X + Y. 

(b) AY = 0 implies AcX = 0: If X is a solution, so is cX. 
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For example, let W be the set of solutions of the equation 
(1.3) 2xi — x 2 — 2xs = 0, or AX = 0 ， 

where A = [2 -1 2], This space is the set of vectors lying in the plane through the 

origin and orthogonal to A. Every solution is a linear combination CiWi + c 2 w 2 of 
two particular solutions w u W 2 . Most pairs of solutions, for example 


(1.4) 

W\ = 

一 r 

0 

,W2 = 

_r 

2 



_i_ 


0 

■M* 


will span the space of solutions in this way. Thus every solution has the form 


(1.5) 


ci+c 2 


C 1 W 1 + c 2 w 2 = 2c 2 

C\ 


where ci ， C 2 are arbitrary constants. Another choice of the particular solutions w \, wi 
would result in a different but equivalent description of the space of all solutions. 

(1.6) Definition • A real vector space is a set V together with two laws of compo¬ 
sition; 


(a) Addition: V x V - > V, written v, + w 

(b) Scalar multiplication: [R x y - > V, written c, v /w ^cv 

These laws of composition must satisfy the following axioms: 

(i) Addition makes V into an abelian group 

(ii) Scalar multiplication is associative with multiplication of real numbers: 

(ab)v = a (bv ). 

(iii) Scalar multiplication by the real number 1 is the identity operation; 

It; = v. 

(iv) Two distributive laws hold: 

(a + b)v = av bv 
a{v w) = av + aw. 

Of course all the axioms should be quantified universally; that is, they are assumed 
to hold for all ^ E [R and all v,w E V. 

The identity element for the addition law in V is denoted by 0, or by (V if there 
is danger of confusing the zero vector with the number zero. 
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Notice that scalar multiplication associates to every pair consisting of a real 
number c and a vector t; another vector cv. Such a rule is called an external law of 
composition on the vector space- 

Multiplication of two vectors is not a part of the structure, though various 
products, such as the cross product of vectors in R 3 , can be defined. These products 
aren’t completely intrinsic; they depend on choosing coordinates. So they are con¬ 
sidered to be additional structure on the vector space. 

Read axiom (ii) carefully. The left side means multiply a and b as real num¬ 
bers, then scalar multiply ab and u，to get a vector. On the right side, both opera¬ 
tions are scalar multiplication. 

The two laws of composition are related by the essential distributive laws. 
Note that in the first distributive law the symbol + on the left stands for addition of 
real numbers，while on the right, it stands for addition of vectors. 

(L7) Proposition. The following identities hold in a vector space V: 

(a) Oru = (V， for all v G V, 

(b) cO v = (V， for all c E [R ， 

(c) (~l)v = -v, for all v ^ V. 

Proof. To see (a), we use the distributive law to write 

Ou + Ou = (0 + 0)v — Ov = Ov + 0. 

Cancelling Ot; from both sides, we obtain Ou = 0, Please go through this carefully, 
noting which symbols 0 refer to the number and which refer to the vector. 

Similarly, cO + cO = c(0 + 0) = cO. Hence cO = 0. Finally ， 

t; + — Id = lu + —lt) = (l + —1 )d = Ou = 0. 

Hence 一 lu is the additive inverse of u. □ 

(1.8) Examples. 

(a) A subspace of R n is a vector space，with the laws of composition induced from 
those on U n . 

(b) Let V = C be the set of complex numbers. Forget multiplication of complex 
numbers，and keep only addition a + /3 and multiplication ca of a complex 
number a by a real number c. These operations make C into a real vector 
space- 

(c) The set of real polynomials p(x) = a n x n + … + flo is a vector space，with 
addition of polynomials and multiplication of polynomials by scalars as its 
laws of composition, 

(d) Let V be the set of continuous real-valued functions on the interval [0,1]. Look 
only at the operations of addition of functions f + g and multiplication of 
functions by numbers cf. This makes V a real vector space. 
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Note that each of our examples has more structure than we look at when we 
view it as a vector space. This is typical. Any particular example is sure to have 
some extra features which distinguish it from others，but this is not a drawback of the 
definition. On the contrary, the strength of the abstract approach lies in the fact that 
consequences of the general axioms can be applied to many different examples - 


2. ABSTRACT FIELDS 


It is convenient to treat the real and complex cases simultaneously in linear algebra. 
This can be done by listing the properties of the “scalars” which are needed axiomat- 
ically，and doing so leads to the notion of a field. 

It used to be customary to speak only of subfields of the complex numbers • A 
subfield of C is any subset which is closed under the four operations addition, sub¬ 
traction, multiplication, and division，and which contains 1. In other words, F is a 
subfield of C if the following properties hold: 

( 2 . 1 ) 

(a) if a，b E ： F ， then a + b E ： F. 

(b) If a G F^ then — a G F. 

(c) If a,b E F y then ab E F. 

(d) If a E F and fl 妾 0, then a— 1 E 厂 

(e) 1 E F. 


Note that we can use axioms (a) ，（ b)，and (e) to conclude that 1 — 1 = 0 is an ele¬ 
ment of F. Thus F is a subset which is a subgroup of C+ under addition and such 
that F — {0} = F x is a subgroup of C x under multiplication. Conversely, any such 
subset is a subfield- 

Here are some examples of subfields of C: 

(2*2) Examples. 

(a) F = U, the field of real numbers. 

(b) F = Q, the field of rational numbers (= fractions of integers). 

(c) F = Q[V2], the field of all complex numbers of the form a + where 

a,b E Q. 

It is a good exercise to check axioms (2.1) for the last example. 

These days，it is customary to introduce fields abstractly. The notion of an ab¬ 
stract field is harder to grasp than that of a subfield of C, but it contains important 
new classes of fields，including finite fields. 
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(2.3) Definition. K field F is a set together with two laws of composition 

F x F + > F and F x F — 
a, b /w ^a + b a.b^^ab 

called addition and multiplication，and satisfying the following axioms: 

(i) Addition makes F into an abelian group F+. Its identity element is denoted 
by 0. 

(ii) Multiplication is associative and commutative and makes F x = F — {0} into a 
group. Its identity element is denoted by 1, 

(iii) Distributive law: For all a,b,c G F, (a + b)c = ac + be. 

The first two axioms describe properties of the two laws of composition, addition 
and multiplication, separately. The third axiom，the distributive law，is the one 
which relates addition to multiplication. This axiom is crucial，because if the two 
laws were unrelated，we could just as well study each of them separately. Of course 
we know that the real numbers satisfy these axioms, but the fact that they are all that 
is needed for arithmetic operations can only be understood after some experience in 
working with them. 

One can operate with matrices A whose entries aij are in any field F. The dis¬ 
cussion of Chapter 1 can be repeated without change, and you should go back to 
look at this material again with this in mind. 

The simplest examples of fields besides the subfields of the complex numbers 
are certain finite fields called the prime fields，which we will now describe. We saw 
in Section 9 of Chapter 2 that the set Z/«Z of congruence classes modulo n has laws 
of addition and multiplication derived from addition and multiplication of integers. 
Now all of the axioms for a field hold for the integers, except for the existence of 
multiplicative inverses in axiom (2_3ii). The integers are not closed under division. 
And as we have already remarked, such axioms carry over to addition and multipli¬ 
cation of congruence classes. But there is no reason to suppose that multiplicative in¬ 
verses will exist for congruence classes, and in fact they need not. The class of 2, for 
example, does not have a multiplicative inverse modulo 6. So it is a surprising fact 
that if is a prime integer then all nonzero congruence classes modulo p have in¬ 
verses, and therefore the set Z/pZ is a field. This field is called a prime field and is 
usually denoted by f p : 

(2.4) = {0, l，."，p - 1} = Z/pZ. 

(2.5) Theorem. Let p be a prime integer. Every nonzero congruence class a 
(modulo p) has a multiplicative inverse, and hence is a field with p elements. 

The theorem can also be stated as follows: 

(2.6) Let p be a prime，and let a be any integer not divisible by p. 

There is an integer b such that ab = l (modulo p). 
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For ab = 1 (modulo p) is the same asab = ab= 1, which means that b is the mul¬ 
tiplicative inverse of a. 一 

For example，let p = 13 and a = 6. Then a ' 1 = 11 because 

6 • 11 = 66 = 1 (modulo 13). 


Finding the inverse of a congruence class a (modulo p) is not easy in general, but it 
can be done by trial and error if p is small. A systematic way is to compute the pow¬ 
ers of a. Since every nonzero congruence class has an inverse，the set of all of them 
forms a finite group of order p _ 1 ， usually denoted by f p \ So every element a has 
finite order dividing /? — 1. Thus if /? = 13 and a = 3, we find a 2 = 9, and 
a 3 = 27 = 1, which shows that a has order 3. We^are lucky: = a 2 = 9. On the 

other hand, if we had tried this method with a = 6, we would have found that 6 has 
order 12. The computation would have been lengthy. 

Proof of Theorem (2.5). Let a E be any nonzero element, and let us use 
the method just discussed to show that a has an inverse. We consider the powers 
1 ，互 , a 2 ,a 3 ，.... Since there are infinitely many powers and only finitely many ele¬ 
ments in Fp，there must be two powers which are equal，say a m = a n , where 
m < «. At this point, we would like to cancel a m , to obtain 1 = a n ^ m . Once this 
cancellation is justified，we will have shown that a n " m ~ l is the inverse of a. This 
will complete the proof. 

Here is the cancellation law we need: 


(2.7) Lemma* Cancellation Law ； Let a,c,d be elements of Fn with a ^ 0. If 
ac — ad 7 then c = d. 

Proof, Set b = c — d. Then the statement of the lemma becomes: lfab — 0 
and a 关 0, then 5 = 0. To prove this，we represent the congruence classes a, b by 
integers a ， b. Then what has to be shown is the following intuitively plausible fact: 

(2.8) Lemma, Let p be a prime integer and let a, b be integers. If p divides the 
product ab ，then p divides a or p divides b. 

Proof. Suppose that p does not divide a, but that p divides ab. We must show 
that p divides b. Since /? is a prime，1 and p are the only positive integers which di¬ 
vide it. Since p does not divide a, the only common divisor of p and a is 1. So 1 is 
their greatest common divisor. By Proposition (2.6) of Chapter 2, there are integers 
r ? s so that 1 = rp + sa. Multiply both sides by 办：办 =rpb + sab. Both of the 
terms on the right side of this equality are divisible by /?; hence the left side a is di¬ 
visible by p too, as was to be shown. □ 

As with congruences in general，computations in the field can be made by 
working with integers, except that division can not be carried out in the integers. 
This difficulty can often be handled by putting everything on a common denomina¬ 
tor in such a way that the required division is left until the end. For example, suppose 
we ask for solutions of a system of n linear equations in n unknowns，in the field ¥ p . 
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We represent the system of equations by an integer system，choosing representatives 
for the residue classes in a convenient way. Say that the integer system is AX = B ， 
where A is an n x n integer matrix and B is an integer column vector. Then to solve 
the system in ¥ p , we try to invert the matrix A modulo p. Cramer’s Rule ， 
(adj A)A = 81 ， where 8 = det A, is a formula valid in the integers [Chapter 1 (5.7 )]， 
and therefore it also holds in f p when the matrix entries are replaced by their con¬ 
gruence classes. If the residue class of 8 is not zero, then we can invert the matrix A 
in IFp by computing 5 一 1 (adj A). 

(2,9) Corollary. Consider a system AX = B of n linear equations in n unknowns 
where the entries of A,B are in ¥ p . The system has a unique solution in if 
det A ^ 0 in F p . □ 

For example, consider the system of linear equations AX = B, where 




2 



6 


and 





Since the coefficients are integers, they define a system of equations in f p for any 
prime p. The determinant of A is 42, so the system has a unique solution in ¥ p for all 
p different from 2, 3 and 7. Thus if p = 13, we find det A = 3 when evaluated 
(modulo 13). We already saw that 3 一 1 = 9 in Fi 3 . So we can use Cramer’s Rule to 
compute 


A 一 




and 


X = A~ l B = 



4 


， in F 13 . 


The system has no solution in F 2 or F 3 . It happens to have solutions in F 7 , though 
det A = 0 in that field. 

We remark in passing that invertible matrices with entries in the field ¥ p pro¬ 
vide new examples of finite groups — the general linear groups over finite fields: 


GL n (JFp) = {nX n invertible matrices with entries in F p }. 

The smallest of these is the group GL 2 (F 2 ) of invertible 2x2 matrices with entries 
(modulo 2)，which consists of the six matrices 

( 2 . 10 ) 

(^2(『2) = 

There is one property of the finite fields F = f p which distinguishes them 
from subfields of C and which affects computations occasionally. This property is 
that adding 1 to itself a certain number of times (in feet p times) gives 0. A field F 
is said to have characteristic p if 1 + … + 1 (/? terms) = 0 in F，and if p is the 
smallest positive integer with that property. In other words，the characteristic of F is 
the order of 1， as an element of the additive group 广 ， provided that the order 
is finite (Chapter 2, Section 2). In case the order is infinite, that is, 1 十 •" 十 1 is 
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never 0 in F, the field is, paradoxically, said to have characteristic zero. Thus 
subfields of C have characteristic zero，while the prime field IF^ has characteristic 
p. It can be shown that the characteristic of any field F is either zero or a prime 
number. 

Now let F be an arbitrary field* A vector space over a field F is defined as in 
(1.6)，with F replacing [R. 

(2.11) Definition. A vector space V over a field F is a set together with two laws 
of composition: 

(a) addition: Vx V - > V, written v, + w, 

(b) scalar multiplication: F XV - > V y written c ， v /vw ^cv, 

and satisfying the following axioms: 

(i) Addition makes V into a commutative group V + . 

(ii) Scalar multiplication is associative with multiplication in F: 

(ab)v = a (bv) 7 for all a，b E F and t; E K 

(iii) The element 1 acts as identity: lv = v, for all v E V. 

(iv) Two distributive laws hold: 

(a + b)v = av bv and a{v ^ w) = av + aw, 

for all a,b e F and v y w E V. 

All of Section 1 can be repeated, replacing the field U by F. Thus the space F n 
of row vectors (" 〗 ”•• ， a, E F, is a vector space over F and so on. 

It is important to note that the definition of vector space includes implicitly the 
choice of a field F. The elements of this field F are often called scalars. We usually 
keep this field fixed. Of course, if V is a complex vector space, meaning a vector 
space over the field C，and if F C C is any subfield, then V is also naturally a vector 
space over F because cv is defined for all c E F. But we consider the vector space 
structure to have changed when we restrict the scalars from C to F. 

Two important concepts analogous to subgroups and isomorphisms of groups 
are the concepts of subspace and of isomorphism of vector spaces, We have already 
defined subspaces for complex vector spaces, and the definition is the same for any 
field A subspace W of a vector space V (over a field F) is a subset with the follow¬ 
ing properties: 

( 2 . 12 ) 

(a) If w,w f E W, then w + w f ^ W. 

(b) If iv E W and c E F, then cw E W. 

(c) 0 E W. 
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A subspace W is called a proper subspace of V if it is neither the whole space V nor 
the zero subspace {0}. 

It is easy to see that a subspace is just a subset on which the laws of composi¬ 
tion induce the structure of vector space. 

As in Section 1 ， the space of all solutions of a system of m linear equations in 
n unknowns 


AX = 0, 

with coefficients in F, is an example of a subspace of the space F n , 

(2.13) Definition. An isomorphism <p from a vector space V to ^ vector space V f , 

both over the same field F, is a bijective map <p\ V - >V f compatible with the laws 

of composition，that is，a bijective map satisfying 

(a) <p{v + v f ) = <p(v) + <p(v f ) and (b) <p(cv) = ccp(v), 

for sA\ v,v f B V and all c E ： F. 

(2J4) Examples. 

(a) The space F n of ^-dimensional row vectors is isomorphic to the space of n- 
dimensional column vectors. 

(b) View the set of complex numbers C as a real vector space，as in (1.8b). Then 

the map <p: [R 2 - > C sending (a,b) / ^^a + hi is an isomorphism. 



In this section we discuss the terminology used when working with the two opera¬ 
tions, addition and scalar multiplication, in an abstractly given vector space. The 
new concepts are span，linear independence ， and basis• 

It will be convenient to work with ordered sets of vectors here. The ordering 
will be unimportant much of the time，but it will enter in an essential way when we 
make explicit computations. We’ve been putting curly brackets around unordered 
sets, so in order to distinguish ordered from unordered sets, let us enclose ordered 
sets with round brackets. Thus the ordered set (a, b) is considered different from the 
ordered set (& ， fl)，whereas the unordered sets {a, b} and {b,a\ are considered equal. 
Repetitions will also be allowed in an ordered set. So {a, a, b) is considered an or¬ 
dered set, and it is different from {a,b), in contrast to the convention for unordered 
sets，where {a, a, b} would denote the same set as {a, b}. 

Let V be a vector space over a field F ， and let (ui be an ordered set of 
elements of V. A linear combination of (ui ，_..， v n ) is any vector of the form 

( 3 , 1 ) w = C\V\ + C2V2 + •* • + c n v n ，d E F. 
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For example，suppose that the ordered set consists of the two vectors in IR 3 
considered in (L4): V\ = (1 ? 0 ? l) x and v 2 = (l ， 2,0)t. Then a linear combination 
will have the form (1.5): (ci + ci, 2c 2 , Ci)、The vector (3, 4, iy = t?i + 2vi is one 
such linear combination. 

A solution X of a system of linear equations written in the matrix form AX — B 
[Chapter 1 (1.9)] exhibits the column vector as a linear combination of the 
columns of the matrix A . The coefficients are the entries of the vector X. 

A linear combination of a single vector (v) is just a multiple cv or v . 

The set of all vectors w which are linear combinations of (vu..^v n ) forms a 
subspace W of V, called the subspace spanned by the set: If w (3.1) and 
w f = c/ui + … + CnV n are elements of W, then so is 

W + w' = (Cl + + + (Cn + C n ’)v n , 

and if a E F y then aw = (aci)vi + … + (ac n )v n is in W- So w + w f and aw are in 
W. Finally, 0 = Ovt + + 0v n E W. This shows that the conditions of (2A2) 

hold. 

The space spanned by a set S will often be denoted by Span S. Clearly, Span5 
is the smallest subspace of V which contains S. We could also call it the subspace 
generated by S. Note that the order is irrelevant here. The span of S is the same as 
the span of any reordering of S. 

One can also define the span of an infinite set of vectors. We will discuss this 
in Section 5. In this section, let us assume that our sets are finite. 

(3.2) Proposition* Let 5 be a set of vectors of V, and let W be a subspace of V. If 
SCW, then Span S CW. 

This is obvious，because W is closed under addition and scalar multiplication. If 
S C Wj then any linear combination of vectors of S is in too, □ 

A linear relation among vectors is any relation of the form 

(3*3) C\V\ + c 2 t ；2 + … + c n v n = 0, 

where the coefficients c/ are in F. An ordered set (t ； i， …， of vectors is called lin¬ 
early independent if there is no linear relation among the vectors in the set，except 
for the trivial one in which all the coefficients c, are zero. It is useful to state this 
condition positively: 

(3.4) Let v n ) be a linearly independent set. Then 

from the equation CiUi + ." + c n v n = 0, 
we can conclude that Ci = Ofor every i = l ， ... ， n. 

Conversely, if (3.4) holds，then the vectors are linearly independent. 

The vectors (1.4) are linearly independent. 
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Note that a linearly independent set S can not have any repetitions. For if two 
vectors Vi ? vj of S are equal，then 

Vi — Vj = 0 

is a linear relation of the form (3.3), the other coefficients being zero. Also, no vec¬ 
tor Vi of a linearly independent family may be zero, because if it is, then Vi = 0 is a 
linear relation. 

A set which is not linearly independent is called linearly dependent • 

If V is the space F m and if the vectors ( 仍 ，•_•，are given explicitly, we can 
decide linear independence by solving a system of homogeneous linear equations. 
For to say that a linear combination x\V\ + + x n v n is zero means that each coor¬ 

dinate is zero, and this leads to m equations in the n unknowns xt . For example ， con¬ 
sider the set of three vectors 




_r 


_r 


_2 — 

(3.5) 

V\ = 

0 

， V2 = 

2 

, V 3 = 

1 



_i_ 


_0_ 


_ 2 _ 


Let a denote the matrix whose columns are these vectors: 

'1 1 2" 

(3.6) A = 0 2 1 . 

_1 0 2_ 

A general linear combination of the vectors will have the form x\V\ + x 2 v 2 + X 3 V 3 . 
Bringing the scalar coefficients to the other side, we can write this linear combina¬ 
tion in the form AX, where X = (xi ^) 1 . Since det 4 = 1， the equation AX = 0 
has only the trivial solution，and this shows that (t ； i ， vi, V 3 ) is a linearly independent 
set. On the other hand，if we add an arbitrary fourth vector V 4 to this set, the result 
will be linearly dependent, because every system of three homogeneous equations in 
four unknowns has a nontrivial solution [Chapter 1 (2.17)]. 

Here are some elementary facts about linear independence* 

(3.7) Proposition. 

(a) Any reordering of a linearly independent set is linearly independent. 

(b) If i；i E V 7 is a nonzero vector, then the set (t ； i) is linearly independent, 

(c) A set (vuvi) of two vectors is linearly dependent if and only if either Vi = 0, 
or else U 2 is a multiple of v \. 

Let us verify the third of these assertions: Assume (v } , V 2 ) dependent* Let the rela¬ 
tion be c\V\ + C 2 U 2 = 0 ? where ci, c 2 are not both zero. If c 2 ^ 0, we can solve for 

-Cl 

V 2 = - V U 

C2 
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In this case vi is a multiple of ui. If = 0, then Ci ^ 0 and the equation shows that 
v\ = 0. Conversely, if v 2 - cv Y ? then the relation cv\ - Vi = 0 shows that the set 
(vi, vi) is linearly dependent, and if v\ = 0, then the relation v { + Ovi = 0 shows 
the same thing. □ 


A set of vectors (v] y ^. ? v n ) which is linearly independent and which also spans 
V is called a basis• For example, the vectors (1.4) form a basis for the space of solu¬ 
tions of the linear equation (1.3). We will often use a symbol such as B to denote a 
basis. 

Let B = ( Di ， …， u rt ) be a basis. Then since B spans V, every w ^ V can be 
written as a linear combination (3.1). Since B is linearly independent，this expres¬ 
sion is unique. 


(3.8) Proposition. The set B = (u! ， … ， w) is a basis if and only if every vector 
w E ： V can be written in a unique way in the form (3.1). 

Proof • Suppose that B is a basis and that w is written as a linear combination in 
two ways, say (3,1) and also w = C\V\ + … + Then 

0 = W - W = (c { - c/)ui + ••• + (c n - Cn)v n . 

Hence by (3.4) C\ — C\ = 0 ， ". ， c rt — c n f — 0. Thus the two linear combinations 
are the same. On the other hand, the definition of linear independence for B can be 
restated by saying that 0 has only one expression as a linear combination. This 
proves the converse. □ 

(3.9) Example. Let V = F n be the^space of column vectors，and let ei denote the 
column vector with 1 in the ith position and zeros elsewhere. The n vectors ei form 
a basis for F n called the standard basis, This basis was introduced before，in Chap¬ 
ter 1 3 Section 4. We will denote it by E. Every vector X = (jci ， …， has the 
unique expression 


X = x { e[ + 

as a linear combination of E = (e]，•••， 。). 





The set (3,5) is another basis of [R 3 . 

We now discuss the main facts (3,15-3.17) which relate the three notions of 
span，linear independence, and basis* 

(3.10) Proposition. Let L be a linearly independent ordered set in V, and let 
v E ： Vbc any vector. Then the ordered set L ; = (L, v) obtained by adding u to L is 
linearly independent if and only if v is not in the subspace spanned by L. 

Proof. Say that L = ( 仍， … ， u r ). If u E Span L, then v = C\V\ + + c r v r 

for some c t E F. Hence 

C\V\ + … + c r v r + l)t) = 0 
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is a linear relation among the vectors of L f , and the coefficient -1 is not zero. Thus 
L f is linearly dependent. 

Conversely，suppose that L f is linearly dependent, so that there is some linear 
relation 

C\V\ + … + C r Vr + 如 = 0, 

in which not all coefficients are zero. Then certainly b ^ 0. For, if b were zero, the 
expression would reduce to 

CiV\ + + C r V r = 0. 

Since L is assumed to be linearly independent, we could conclude that 
Ci = •_• = = 0 too, contrary to hypothesis. Now that we know 办 ^ 0， we can 

solve for v: 


~Ci 

V = V\ 




-Cr 

~b 


V r . 


Thus v E Span L. □ 

(3.11) Proposition. Let S be an ordered set of vectors, let v E ： V be any vector, 
and let 5 r = (S,v). Then Span S = Span S f if and only if t; E Span S. 

〆 

Proof. By definition ，v E Span S f * So if t; € Span 5 ， then Span S ^ 
Span S f • Conversely, if v E ： Span S, then S f C Span 5; hence Span S f C Span S 
(3.2). The fact that Span S f D Span S is trivial, and so Span S f = Span S. □ 

(3.12) Definition. A vector space V is called finite-dimensional if there is some 
finite set S which spans V. 

For the rest of this section，we assume that our given vector space V is finite- 
dimensional. 


(3.13) Proposition* Any finite set S which spans V contains a basis. In particular, 
any finite-dimensional vector space has a basis. 

Proof• Suppose S = (t ； i v n ) and that S is not linearly independent. Then 
there is a linear relation 

ClUi + … + C n V n = 0 

in which some ct is not zero, say c n ^ 0. Then we may solve for v n ' 


~C\ 

v n = - V\ 

Cn 




~~Cn—\ 

Cn 


V n -i. 


This shows that v n G Span(i ； i ? ._,u ft -i)- Putting v = v n and S = in 

(3.11), we conclude Span(t ； i ， … ， i) = SpQn(v u ...,v n ) = V. So we may elimi¬ 
nate v n from S. Continuing this way we eventually obtain a family which is linearly 
independent but still spans V—a basis. 
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Note. There is a problem with this proof if V is the zero vector space {0}. For ， 
starting with an arbitrary collection of vectors in V (all of them equal to zero) ? our 
procedure will throw them out, one at a time，until there is only one vector v\ — 0 
left. And (0) is a linearly dependent set. How can we eliminate it? Of course the 
zero vector space is not particularly interesting. But it may lurk around，waiting to 
trip us up. We have to allow the possibility that a vector space which arises in the 
course of some computation，such as solving a system of homogeneous linear equa¬ 
tions, is the zero space. In order to avoid having to make special mention of this 
case in the future, we adopt the following conventions: 

(3.14) (a) The empty set is linearly independent. 

(b) The span of the empty set is the zero subspace. 

Thus the empty set is a basis for the zero vector space. These conventions allow us 
to throw out the last vector v\ = 0, and rescue the proof. □ 


(3.15) Proposition. Let V be a finite-dimensional vector space. Any linearly inde¬ 
pendent set L can be extended by adding elements, to get a basis. 

Proof • Let 5 be a finite set which spans V. If all elements of S are in Span L, 
then L spans V (3,2) and so it is a basis. If not，choose v G S, which is not in 
Span L. By (3.10), (L ? v) is linearly independent. Continue until you get a basis. □ 

(3.16) Proposition. Let S, L be finite subsets of V. Assume that S spans V and that 
L is linearly independent. Then S contains at least as many elements as L does. 

Proof. To prove this ， we write out what a relation of linear dependence on L 
means in terms of the set S ，obtaining a homogeneous system of m linear equations 
in n unknowns, where m — \ S \ and n 二 |L|. Say that S = (ui”_. ， u m ) and 
L = We write each vector wj as a linear combination of S, which we 

can do because S spans V, say 

wj = a }j vi + ••_ + a mj v m = X ciijVi. 

i 

Let w 二 Ciw! + … + c n w n = ^jCjWj be a linear combination* Substituting，we 
obtain 

M = X CjdijVh 

m * 

hj 

The coefficient of Vi in this sum is 'EjaijCj. If this coefficient is zero for every then 
w = 0, So to find a linear relation among the vectors of L, it suffices to solve the 
system ^jaijxj = 0 of m equations in n unknowns. If m < n ， then this system has a 
nontrivial solution [see Chapter 1 (2.17)], and therefore L is linearly dependent. □ 


(3.17) Proposition• Two bases Bi, B 2 of the vector space V have the same number 
of elements. 
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Proof. Put Bi = S, B 2 = L in (3.16) to get |Bi| > |b 2 |. By symmetry ， 
|B 2 | > |Bi|. □ 

(3.18) Definition. The dimension of a finite-dimensional vector space V is the 
number of vectors in a basis. The dimension will be denoted by dim V. 

(3.19) Proposition. 

(a) If S spans V, then \S\ ^ dim V, and equality holds only if 5 is a basis. 

(b) If L is linearly independent, then [L| ^ dim V, and equality holds only if L is 
a basis. 

Proof• This follows from (3, 13) and (3.15). □ 

(3.20) Proposition* If W C V is a subspace of a finite-dimensional vector space, 
then W is finite-dimensional，and dim W ^ dim V. Moreover, dim W = dim V 
only if W = V. 

Proof. This will be obvious, once we show that W is finite-dimensional. For，if 
< V, that is，if W is contained in but not equal to V, then a basis for W will not 
span V, but it can be extended to a basis of V by (3.15). Hence dim W < dim V. We 
now check finite-dimensionality: If some given linearly independent set L in does 
not span W, there is a vector w E W not in Span L, and by Proposition (3_10 )， 
(L ? w) is linearly independent. So, we can start with the empty set and add elements 
of W using (3.10), hoping to end up with a basis of W. Now it is obvious that if L is 
a linearly independent set in W then it is also linearly independent when viewed as a 
subset of V. Therefore (3.16) tells us that \L \ < n = dim V. So the process of 
adding vectors to L must come to an end after at most n steps. When it is impossible 
to apply (3.10) again，L is a basis of W. This shows that W is finite-dimensional，as 
required. □ 

Notes. 

(a) The key facts to remember are (3.13) ， (3.15), and (3.16). The others follow. 

(b) This material is not deep. Given the definitions, you could produce a proof of 
the main result (3.16) in a few days or less, though your first try would probably 
be clumsy. 

One important example of a vector space is obtained from an arbitrary set S by 
forming linear combinations of elements of S with coefficients in F in a formal way* 
If S = (h ， … ，心 ） is a finite ordered set whose elements are distinct, then this space 
V = V(S) is the set of all expressions 


( 3 . 21 ) 


+ … + dnS n , Cli 6 F. 
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Addition and scalar multiplication are carried out formally，assuming no relations 
among the elements Si ： 

(3.22) 

(a\S\ + •* •+ a n s n ) + + ••• + b n s n ) = (fli + b\)s\ + **• + (a n + b n )s n 

c(ais } + + a n s n ) = (cai)*Si + … + (ca n )s n - 

This vector space is isomorphic to F n , by the correspondence 

( 3 . 23 ) (a u ...,a n )^^aiSi + ••- + a n s n . 

Therefore the elements st, interpreted as the linear combinations 

si = Isi + 0 幻 + … + 0s n , 

form a basis which corresponds to the standard basis of F n under the isomorphism 

(3.23) , Because of this, V(S) is often referred to as the space with basis S, or the 
space of formal linear combinations of S. If S is an infinite set, V (S) is defined to be 
the space of all finite expressions (3.21)，where Si E S (see Section 5)_ 

Since V (S) is isomorphic to F n when S contains n elements, there is no com¬ 
pelling logical reason for introducing it. However, in many applications, V(S) has a 
natural interpretation. For example, if ^ is a set of ingredients, then a vector v may 
represent a recipe. Or if 5 is a set of points in the plane, then v (3*21) can be inter¬ 
preted as a set of weights at the points of S. 


4. COMPUTATION BASES 

The purpose of bases in vector spaces is to provide a method of computation ， and we 
are going to learn to use them in this section. We will consider two topics: how to 
express a vector in terms of a given basis, and how to relate two different bases of 
the same vector space. 

Suppose we are given a basis of a vector space V. Remember; This 

means that every vector v E. V can be expressed as a linear combination 

(4.1) v = X{V\ + ••• + x n v n , Xi E. F y 

in exactly one way. The scalars x/ are called the coordinates of v, and the column 
vector 

(4.2) X _ 

is called the coordinate vector of v 9 with respect to the basis. We pose the problem 
of computing this coordinate vector. 

The simplest case to understand is that V is the space of column vectors F n . 


Xi 

■ 

» 

■ 

Xrt 
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Let B = (t；i ，…， u„) be a basis of F n . Then each element Vi of our basis is a column 
vector，and so the array ( 仍， __. ， v n ) forms an n x « matrix. It seems advisable to in¬ 
troduce a new symbol for this matrix, so we will write it as 



[B] = vi 


Vn 


For example, if B is the basis 
(4.4) v\ = \ ? v 2 = 


then 





If E = (ei ， … ， G) is the standard basis，the matrix [e] is the identity matrix. 

A linear combination x\V\ + + x n v n can be written as the matrix product 


(4.5) 

[B]X = 

V\ * Vn 

■ ■ ■ i 

X\ 

醫 

• 






= V\Xi 



4 • • 


+ V n X n , 


where X denotes the column vector (xi ， … ，知 )、 This is another example of block 
multiplication. The only new feature is that the definition of matrix multiplication 
has caused the scalar coefficients Xi to migrate to the right side of the vectors ， which 
doesn’t matter. 

Now if a vector Y = (yi， …， is given, we can determine its coordinate vec¬ 
tor with respect to the basis B by solving the equation 


(4.6) 





X\ 

m 


Jl 

• 

V\ - 

' - Vn 


* 

_ 


* 

_ 


or [B]X = Y 


for the unknown vector X. This is done by inverting the matrix [B], 

(4.7) Proposition. Let B = (ui,..., v n ) be a basis of F n , and let Y E be a vec¬ 
tor. The coordinate vector of Y with respect to the basis B is 


x = [B]' l y. □ 


Note that we get Y back if B is the standard basis E，because [E] is the identity ma¬ 
trix. This is as it should be. 

In Example (4.4), 



So the coordinate vector of y = 
Y = lv\ — 2vi- 


is X = 




which means that 
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Of course we can not solve in this way unless the matrix is invertible. Fortu¬ 
nately, [B] is always invertible, and in fact it can be any invertible matrix. 

(4.8) Proposition. Let A be an n x n matrix with entries in a field F. The columns 
of A form a basis of F n if and only if A is invertible* 

Proof. Denote the ith column of A by Vi. For any column vector 
X — the matrix product AX = v\Xy + … + v n x n is a linear combination 

of the set ( 仍 ， … ， tO. So this set is linearly independent if and only if the only solu¬ 
tion of the equation AX = 0 is the trivial solution X = 0. And as we know，this is 
true if and only if A is invertible [Chapter 1 (2.18)]. Morever, if (ui ， … ， t^) is a lin¬ 
early independent set, then it forms a basis because the dimension of F n is n. □ 


Now let V be an abstractly given vector space. We want to use matrix notation 
to facilitate the manipulation of bases, and the way we have written ordered sets of 
vectors was chosen with this in mind: 


(4.9) 


(r i ， _ * • ， rn) * 


Perhaps this array should be called a hypervector • Unless our vectors are given con¬ 
cretely, we won’t be able to represent this hypervector by a matrix，so we will work 
with it formally, as if it were a vector. Since multiplication of two elements of a 
vector space is not defined，we can not multiply two matrices whose entries are vec¬ 
tors. But there is nothing to prevent us from multiplying the hypervector ( 仂 ， ••• ， v m ) 
by a matrix of scalars. Thus a linear combination of these vectors can be written as 
the product with a column vector X: 


(4.10) 





=ViXi + **• + v m x m . 



Evaluating the product, we obtain another vector — a linear combination. The scalar 
coefficients jc, are on the right side of the vectors as before. If we use a symbol such 
as B to denote the set (仍 ”.• ， u m )，then the notation for this linear combination be¬ 


comes very compact: BX = v { xy + … + v n x n - 

We may also multiply a hypervector on the right by a matrix of scalars. If A 
an m x n matrix, the product will be another hypervector, say (v^”" ， nvi): 


(4.11) 


(U 】 ”"， 


A 


(^i ，…， 


To evaluate the product，we use the rule for matrix multiplication; 

(4.12) wj = ViQij + v 2 a 2 j + ••• + v m a m j\ 

So each vector wj is a linear combination of (u[ ， ..• ， u m )，and the scalar coefficients in 
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this linear combination form the columns of the matrix A. That is what the equation 
means. For example, 


Oi ， u 2 ) 


3 2 1 
4 0 1 


(3t；i + 4v 2 ,2v l7 vi + v 2 ). 


Let us restate this formally: 


(4,13) Proposition. Let S = (Ui， …， and U = w n ) be ordered sets of 

elements of a vector space V. The elements of U are in the span of S if and only if 
there is an m x n scalar matrix A such that (v\ ， ••• ， u m )A = ( 州 i ， *•• ， w n ). □ 


Now let us consider the problem of determining the coordinate vector X of a 
given vector v S V with respect to a given basis B = (Vi ， …， v n ). That is, we wish 
to write v — BX explicitly, as in (4.10). It is clear that this is not possible unless 
both the basis and the vector are given in some explicit way，so we can not solve the 
problem as posed. But we can use multiplication by the hypervector B to define ab¬ 
stractly an isomorphism of vector spaces 

(4J4) ip: F n - > V sending 

from the space F n of column vectors to V. This map is bijective because every vec¬ 
tor t; is a linear combination (4,10) in exactly one way — it is surjective because the 
set B spans V, and injective because B is linearly independent. The axioms for an 
isomorphism (2.13) are easy to check. We can use this isomorphism to introduce co¬ 
ordinates into the vector space V. 

The coordinate vector of a vector t; is X = Please note that the symbol 

B 1 is not defined. So unless the basis is given more specifically, we won’t have an 
explicit formula for the inverse function f/T 1 . But the existence of the isomorphism if/ 
is of interest in itself: 


(4.15) Corollary • Every vector space V of dimension n is isomorphic to the space 
F n of column vectors. □ 

Notice that F n is not isomorphic to F m if m i 1 n y because F n has a basis of n 
elements, and the number of elements in a basis depends only on the vector space ， 
not on the choice of a basis. Thus the finite-dimensional vector spaces V over a field 
F are completely classified by (4.15): Every V is isomorphic to F n , for some 
uniquely determined integer n. It follows that we will know all about an arbitrary 
vector space if we study the basic examples of column vectors* This reduces any 
problem on vector spaces to the familiar algebra of column vectors, once a basis is 
given. 

We now come to a very important computat;onal method: change of basis. 
Identifying V with the isomorphic vector space F n is useful when a natural basis is 
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presented to us，but not when the given basis is poorly suited to the problem at hand. 
In that case，we will want to change coordinates. So let us suppose that we are given 
two bases for the same vector space V, say B = and B’ = (u/ ， … ， u；/). 

We will think of B as the old basis，and B' as a new basis. There are two computa¬ 
tions which we wish to clarify. We ask first: How are the two bases related? Sec¬ 
ondly, a vector v E V will have coordinates with respect to each of these bases, but 
of course they will be different. So we ask: How are the two coordinate vectors re¬ 
lated? These are the computations called change of basis. They will be very impor¬ 
tant in later chapters. They are also confusing and can drive you nuts if you don’t 
organize the notation well. 

We begin by noting that since the new basis spans V, every vector of the old 
basis B is a linear combination of the new basis B' = (u/ ， … ， u；/). So Proposition 
(4.13) tells us that there is an equation of the form 


(4.16) 




P 


(Ui ， … ， u „)， or B ' 尸 


B 


where P is an n x n matrix of scalars. This matrix equation reads 

(4.17) p\j + + … + Vn Pnj = Vj, 

where pij are the entries of P. The matrix P is called the matrix of change of basis, 
Its jth column is the coordinate vector of the old basis vector u；, when computed 
with respect to the new basis 

Note that the matrix of change of basis is invertible. This can be shown as fol¬ 
lows: Interchanging the roles of B and B' provides a matrix P f such that BP r = B'. 
Combining this with (4.16)，we obtain the relation BP f P = B: 


(Ui ， …， U„) 尸 ’ 尸 = 


This formula expresses each u, as a linear combination of the vectors (t ； i，••• u n ). The 
entries of the product matrix P f P are the coefficients. But since B is a linearly inde¬ 
pendent set, there is only one way to write as such a linear combination of 
(u[ ， … ， u n )，namely u, = u/ ， orB/ = B. So P f P = I. This shows that P is invertible. 

Now let X be the coordinate vector of v y computed with respect to the old basis 
B ， that is ? v — BX. Substituting (4.16) gives us the matrix equation 

(4.18) v = BX = B f PX. 

This equation shows that PX = X f is the coordinate vector of v with respect to the 
new basis B’_ 

Recapitulating, we have a single matrix 尸 ， the matrix of change of basis, with 
the dual properties 

(4.19) B = B f P and PX = X r ， 

where X, X f denote the coordinate vectors of an arbitrary vector v with respect to the 
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two bases. Each of these properties characterizes P. Note the position of the primes 
carefully. 

We can compute the matrix of change of basis explicitly when V — F n and the 
old basis is the standard basis E ， but where the new basis B' is arbitrary. The two 
bases determine matrices [E] = / and [B ']， as in (4.3). Formula (4.19) gives us the 
matrix equation I = [B f ]P. Hence the matrix of change of basis is 

(4.20) P = [B ’]— 1 ， ifV = F n and if the old basis is E. 

We can also write this as [B’] = P~\ So 

(4.21) If the old basis is E, the new basis vectors are the columns ofP~K 

In the above discussion，the matrix P was determined in terms of two bases B 
and B ’， We could also turn the discussion around, starting with just one basis B and 
an invertible matrix P E GL n (F)• Then we can define a new basis by formula 
(4.16), that is, 

(4.22) B’ = BP l . 

The vectors Vi making up the old basis are in the span of B' because B = B f P 
(4.13). Hence B' spans V and，having the right number of elements ， B' is a basis. 

(4.23) Corollary. Let B be a basis of a vector space V. The other bases are the 
sets of the form B' = BP \ where P E GL n (F) is an invertible matrix* 

It is, of course, unnecessary to put an inverse matrix into this statement. Since P is 
arbitrary, so is 尸一 1 . We could just as well set P~ l = Q and say B' = B0, where 
2 E GL n (F)^ □ 

As an application of our discussion, let us compute the order of the general lin¬ 
ear group GL 2 (F) when F is the prime field ¥ p . We do this by computing the number 
of bases of the vector space V = F 2 . Since the dimension of V is 2, any linearly in¬ 
dependent set (v\, v 2 ) of two elements forms a basis. The first vector V\ of a linearly 
independent set is not zero. And since the order of F is p y V contains p 2 vectors in¬ 
cluding 0. So there are p 2 — l choices for the vector v\ . Next, a set (v x , v 2 ) of two 
vectors, with nonzero, is linearly independent if and only if v 2 is not a multiple of 
Vi (3,7). There are p multiples of a given nonzero vector Vu Therefore if q is given ， 
there are p 1 — p vectors v 2 such that (vj ? v 2 ) is linearly independent. This gives us 

(p 2 - i)(p 2 - p) = pip + ^)(p - 1) 2 
bases for V altogether. 

(4.24) Corollary* The general linear group GL 2 (F P ) has ordtrp(p + 1)(/? — l) 2 . 

Proof. Proposition (4.23) establishes a bijective correspondence between 
bases of F n and elements of GL n (F). □ 
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5 * ESFESTTE-DmEmiONAL SPACES 


Some vector spaces are too big to be spanned by any finite set of vectors. They are 
called infinite - dimensional. We are not going to need them very often, but since they 
are so important in analysis，we will discuss them briefly. 

The most obvious example of an infinite-dimensional space is the space U°° of 
infinite real vectors 

(5.1) (a) = (fli ， fl 2 ， fl 3 ，_")_ 

It can also be thought of as the space of sequences {a n } of real numbers. Examples 
(1.7c ， d) are also infinite-dimensional. 

The space K 00 has many important subspaces. Here are a few examples: 

(5.2) Examples* 

(a) Convergent sequences; C = {(a) G W° | lim a n exists}. 

Yl ~>00 

(b) Bounded sequences: €°° = {(a) E U 00 \ {a n } is bounded}. 

A sequence {a n } is called bounded if there is some real number b, a bound ， 
such that I a n \ ^ b for all n. 

00 

(c) Absolutely convergent series: = {(a) E R 00 1 2 | 如 | < °°}. 

i 

(d) Sequences with finitely many nonzero terms: 

Z = {(a) E 1R 00 I = 0 for all but finitely many n}. 

All of the above subspaces are infinite-dimensional. You should be able to make up 
some more. 


Now suppose that V is a vector space ， infinite-dimensional or not. What should 
we mean by the span of an infinite set S of vectors? The difficulty is this: It is not 
always possible to assign a vector as the value of an infinite linear combination 
C\V\ + c 2 v 2 + •*- in a consistent way. If we are talking about the vector space of 
real numbers, that is ， u，E R】，then a value can be assigned provided that the series 
c\V\ + c 2 v 2 + … converges. The same can be done for convergent series of vectors 
in U n or U°°. But many series don’t converge, and then we don’t know what value to 
assign. 

In algebra it is customary to speak only of linear combinations of finitely many 
vectors. Therefore，the span of an infinite set S must be interpreted as the set of 
those vectors v which are linear combinations of finitely many elements of S: 

(5.3) v = CiUi + … + c r v r , where E S. 


The number r is allowed to be arbitrarily large, depending on the vector v: 


(5.4) 


Span 5 


finite linear combinations 
of elements of S 
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With this definition, Propositions (3.2) and (3.11) continue to hold. 

For example, let ei = (0, … ， 0, 1 ， 0,…） be the vector in R 00 with 1 in the ith 
position as its only nonzero coordinate. Let S = (ei ， e 2 , 巧 ，…） be the infinite set of 
these vectors ei. The set S does not span K 00 , because the vector 

w — (1， 1， 1，“-) 

is not a (finite) linear combination. Instead the span of S is the subspace Z (5.2d), 

A set S, infinite or not, is called linearly independent if there is no finite rela¬ 
tion 

(5.5) ClUi + … + C r V r — 0, Ui ，…， U r G 5 ， 

except for the trivial relation, in which Ci = …二 c r = 0. Again，the number r is 
allowed to be arbitrary, that is, the condition has to hold for arbitrarily large r and 
arbitrary vectors vi 7 ...,v r E S. For example，the set S' = {w\e u e 2j e 3 ,.^) is lin¬ 
early independent, if w, ei are the vectors defined as above. With this definition of 
linear independence, Proposition (3.10) continues to be true. 

As with finite sets, a basis 5 of V is a linearly independent set which spans V. 
Thus S = ( 幻 ，…) is a basis of the space Z. It can be shown, using the Axiom of 
Choice ，that every vector space V has a basis. However，the proof doesn’t tell you 
how to get one. A basis for 1R 00 will have uncountably many elements, and therefore 
it can not be written down in an explicit way. We won’t need bases for infinite-di¬ 
mensional spaces very often. 

Let us go back for a moment to the case that our vector space V is finite¬ 
dimensional (3.12)，and ask if there can be an infinite basis. In Section 3, we saw 
that any two finite bases have the same number of elements. We will now complete 
the picture by showing that every basis is finite. The only confusing point is taken 
care of by the following proposition: 

(5.6) Proposition. Let V be finite-dimensional, and let S be any set which spans 
V. Then S contains a finite subset which spans V. 

Proof• By assumption, there is some finite set, say (vvi”._ ， w m )，which spans 
V. Each Wi is a linear combination of finitely many elements of since Span S = V. 
So when we express the vectors 州 ，…， in terms of the set S, we only need to use 
finitely many of its elements. The ones we use make up a finite subset S f C S. So, 
(wi , …， w m ) C Span S f . Since (wi ，… ， w m ) spans V, so does S f . □ 

(5*7) Proposition. Let V be a finite-dimensional vector space* 

(a) Every set S which spans V contains a finite basis. 

(b) Every linearly independent set L is finite and therefore extends to a finite basis. 

(c) Every basis is finite. 


We leave the proof of (5.7) as an exercise. □ 



102 


Vector Spaces Chapter 3 


6. DIRECT SUMS 

Let V be a vector space, and be subspaces of V. Much of the treatment 

of linear independence and spans of vectors has analogues for subspaces，and we are 
going to work out these analogues here. 

We consider vectors v E V which can be written as a sum 

(6.1) U = + … + W n , 

where wt is a vector in Wi. The set of all such vectors is called the sum of the sub- 
spaces or their span ， and is denoted by 

(6.2) W^i + … + = {u G V I u — Wi + + Wn 7 with Wi G Wi}. 

The sum is a subspace of V, analogous to the span of a set of vectors. 

Clearly，it is the smallest subspace containing 

The subspaces are called independent if no sum wi + *•- + with 

Wi S W is zero, except for the trivial sum in which Wi = 0 for all /. In other words ， 
the spaces are independent if 

(6.3) Wi + … + % 二 0 and Wi E Wi implies Wi = 0 for all i. 

In case the span is the whole space and the subspaces are independent, we say 
that V is the direct sum of 奶 ，…，， and we write 

(6.4) y = Wi ㊉…㊉ 队 ， ifV = 州 + … + 

and ifWi W n are independent• 

This is equivalent to saying that every vector v E V can be written in the form (6.1) 
in exactly one way. 

So, if W\,.^,W n are independent subspaces of a vector space V and if 
U = W\ + *- + W n is their sum, then in fact U is their direct sum: 
U = 呢 ㊉…㊉ 

We leave the proof of the following two propositions as an exercise. 

(6.5) Proposition. 

(a) A single subspace Wy is independent. 

(b) Two subspaces W\ , W 2 are independent if and only if Wi Pi = (0). □ 

(6.6) Proposition. Let Wi， …， be subspaces of a finite-dimensional vector 
space V， and let Bi be a basis for W；. 

(a) The ordered set B obtained by listing the bases Bi ，…， in order is a basis of 

V if and only if V is the direct sum Wi ㊉…㊉ , 

(b) dim(Wi + … + Hy s (dim 州 ） + … + (dim W n ), with equality if and only 
if the spaces are independent. □ 
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(6.7) Corollary. Let W be a subspace of a finite-dimensional vector space V. 
There is another subspace W f such that V = W ㊉ W ’. 

Proof, Let (w u ... y wd) be a basis for W. Extend to a basis (wi,..., wd\ 
vi ， ." ， v n -d) for V (3.15). The span of (vu...,v n -d) is the required subspace W f . □ 

(6.8) Example. Let 仍 ，…， be nonzero vectors，and let Wi be the span of the sin- 
gle vector Vi . This is the one-dimensional subspace which consists of all scalar mul¬ 
tiples of Vi ： Wi = {cvi}. Then W{^..,W n are independent subspaces if and only if 

are independent vectors. This becomes clear if we compare (3,4) and 
(6.3). The statement in terms of subspaces is actually the neater one, because the 
scalar coefficients are absorbed. 


(6.9) Proposition, Let , W 2 be subspaces of a finite-dimensional vector space V. 
Then 


dim Wi + dim W 2 — dim(Wi PI W 2 ) + dim(W^i + W 2 )- 


Proof• Note first that the intersection of two subspaces is again a subspace. 
Choose a basis («i，.•.，for the space Wi PI where r — dim(Wi W 2 ). This is 
a linearly independent set, and it is in W\. Hence we can extend it to a basis of Wi , 
say 

(6.10) (“1 ，…，“ r ; Xi ， … ， Xm^r), 

where m = dim Wi, Similarly，we can extend it to a basis 

( 6 . 11 ) (Ux^..,Ur\yu-,yn-r), 

of W 2 , where n — dim W 2 . The proposition will follow if we show that the set 

(6.12) (“1 ， • • • ， “ r ; 又1 ，…， X m — r ; 乃，…， yn ^ r ) 

is a basis of W\ + W 2 . 

This assertion has two parts. First, the vectors (6.12) span W\ + W 2 . For any 
vector u in Wi + W 2 is a sum v = Wi + with wt E ： We can write wi as a 
linear combination of (6.10) ? and w 2 as a linear combination of (6.11). Collecting 
terms，we find that u is a linear combination of (6.12). 

Next，the vectors (6.11) are linearly independent: Suppose that some linear 
combination is zero, say 

fllWl + … + a r U r + b\X{ + + bm-rXm-r + Ci + … + Cn-rjn-r = 0. 

Abbreviate this as m + x + y = 0. Solve for y ： y = -u — x e 州 . But y & W 2 
too. Hence y E ： W\ H W 2 , and so j is a linear combination, say u\ of («!”_.，《，)• 
Then 一 w' + y = 0isa relation among the vectors (6_11)，which are independent. 
So it must be the trivial relation. This shows that y = 0. Thus our original relation 
reduces to m + x = 0. Since (6.10) is a basis, this relation is trivial: m = 0 and 
x = 0. So the whole relation was trivial, as required. □ 
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I don’t need to learn 8 + 7: I’ll remember 8 + 8 and subtract 1 • 

T, Cuyler Young, Jr. 


EXERCISES 
h Real Vector Spaces 


1, Which of the following subsets of the vector space of real nX n matrices is a subspace? 

(a) symmetric matrices (A = A 1 ) 

(b) invertible matrices 

(c) upper triangular matrices 

2, Prove that the intersection of two subspaces is a subspace. 

3, Prove the cancellation law in a vector space: If cv = cw and c 关 0, then v = w. 

4 * Prove that if w is an element of a subspace W, then -w E W too* 

5. Prove that the classification of subspaces of iR 3 * 5 6 7 8 9 stated after (1.2) is complete. 

6. Prove that every solution of the equation 2x x — x 2 — 2 jc 3 = 0 has the form (1.5), 

7. What is the description analogous to (1.4) obtained from the particular solutions 
U\ = (2, 2,1) and u 2 = (0, 2, 一 1)? 


2. Abstract Fields 


1. Prove that the set of numbers of the form a + 办 V5, where a 9 b are rational numbers, is 
a field. 


2, Which subsets of C are closed under + ， — ， x，and + but fail to contain 1? 

3. Let F be a subset of C such that F + is a subgroup of C + and F x is a subgroup of C x , 
Prove that F is a subfield of (C. 


4* Let V = F n be the space of column vectors. Prove that every subspace W of V is the 
space of solutions of some system of homogeneous linear equations AX^ = 0. 

5. Prove that a nonempty subset W of a vector space satisfies the conditions (2,12) for a 
subspace if and only if it is closed under addition and scalar multiplication. 

6. Show that in Definition (2.3), axiom (ii) can be replaced by the following axiom: F x is 
an abelian group, and 1^=0. What if the condition 1 关 0 is omitted? 

7. Define homomorphism of fields, and prove that every homomorphism of fields is 
injective. 

8. Find the inverse of 5 (modulo p) for p = 2, 3,7,11 ， 13. 

9. Compute the polynomial (x 2 + 3x + 1)(jc 3 + Ax 2 + 2x + 2) when the coefficients are 
regarded as elements of the fields (a) F 5 (b) 


10. Consider the system of linear equations 



(a) Solve it in (Fp when p = 5,11,17. 

(b) Determine the number of solutions when p = 1 • 


* 
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11, Find all primes p such that the matrix 

'1 2 O' 

A = 0 3-1 

-2 0 2 _ 

is invertible, when its entries are considered to be in 

12, Solve completely the systems of linear equations AX = B, where 



一 1 1 0 一 


_0一 


1 

A — 

1 0 1 

，召 = 

0 

and B = 

"I 


1-1 -1 一 


0 一 


1^ 


(a) in Q (b) in F 2 (c) in F 3 (d) in F 7 . 

13. Let be a prime integer. The nonzero elements of ¥ p form a group ¥ p x of order /? — 1. 
It is a fact that this group is always cyclic. Verify this for all primes p < 20 by exhibiting 
a generator. 

14* (a) Let p be a prime. Use the fact that F p x is a group to prove that a p ^ x = 1 (modulo p) 
for every integer a not congruent to zero. 

(b) Prove Fermat's Theorem: For every integer a, 


a p ^ a (modulo p). 

15. (a) By pairing elements with their inverses, prove that the product of all nonzero ele¬ 

ments of F p is -1. 

(b) Let p be a prime integer. Prove Wilsons Theorem: 

(p — 1)! = -1 (modulo p). 

16. Consider a system AX — B of n linear equations in n unknowns, where A and B have in¬ 
teger entries. Prove or disprove: If the system has an integer solution, then it has a solu_ 
tion in for all p. 

17. Interpreting matrix entries in the field F 2 , prove that the four matrices 

form a field. 

18. The proof of Lemma (2.8) contains a more direct proof of (2.6). Extract it. 


I f 


"o l" 

1 0 

， 

1 1 


^1 o" 


0 0 

0 1 

， 

0 0 

-fci 


3. Bases and Dimension 


1. Find a basis for the subspace of R 4 spanned by the vectors (1 ， 2, —1 ， 0) ，（ 4, 8, -4, ~3), 
(0 ， 1 ， 3,4) ， (2,5 ， 1 ， 4). 

2. Let W C U 4 be the space of solutions of the system of linear equations AX = 0, where 

■— ~~i 



3 

0 


Find a basis for W. 


3. (a) Show that a subset of a linearly independent set is linearly independent, 
(b) Show that any reordering of a basis is also a basis. 


4. Let V be a vector space of dimension n over F, and let 0 < r < n. Prove that V contains 
a subspace of dimension r. 
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5. Find a basis for the space of symmetric nx n matrices. 

6. Prove that a square matrix A is invertible if and only if its columns are linearly 
independent. 

7_ Let V be the vector space of functions on the interval [0,1]. Prove that the functions x 3 , 
sin x, and cos x are linearly independent. 

8. Let A be an m x n matrix, and let A' be the result of a sequence of elementary row opera¬ 
tions on A. Prove that the rows of A span the same sub space as the rows of A\ 

9. Let V be a complex vector space of dimension n. Prove that V has dimension 2n as real 
vector space. 

10. A complex nxn matrix is called hermitian if atj = aji for all j\ Show that the hermi- 
tian matrices form a real vector space, find a basis for that space，and determine its 
dimension. 

11. How many elements are there in the vector space ¥ p n l 

12. Let F — f Find all bases of F 2 , 

13. Let F = F 5 . How many subspaces of each dimension does the space F 3 contain? 

14. (a) Let K be a vector space of dimension 3 over the field ¥ p . How many subspaces of 

each dimension does V have? 

(b) Answer the same question for a vector space of dimension 4, 

15. (a) Let F = F 2 . Prove that the group GL 2 {F) is isomorphic to the symmetric group S 3 . 
(b) Let F = F 3 , Determine the orders of GL 2 (F) and of SL 2 (F). 

16. Let W be a subspace of V. 

(a) Prove that there is a subspace U of V such that U + W = V and U D W = 0. 

(b) Prove that there is no subspace U such that W C] U - 0 and that 

dim W + dim U > dim V, 

4. Computation with Bases 

1. Compute the matrix P of change of basis in F 2 relating the standard basis E to 
B’ = (v l7 v 2 ), where = {l 7 3)\ v 2 ^ (2,2 广 

2. Determine the matrix of change of basis, when the old basis is the standard basis 
(☆，...，〜）and the new basis is ( 心， “- 1 ， … ， ei). 

3. Determine the matrix P of change of basis when the old basis is e 2 ) and the new basis 
is (^i + e 2 ，ei — ^ 2 ). 

4. Consider the equilateral coordinate system for 1R 2 , given by the basis B' in which v\ = e\ 
and v 2 is a vector of unit length making an angle of 120° with Find the matrix relat¬ 
ing the standard basis E to B’. 

5. (i) Prove that the set B = ((1 ， 2,0 尸， （ 2,1 ， 2) 、（ 3,1 ， 1 尸 ） is a basis of 

(ii) Find the coordinate vector of the vector v = (1 ， 2, 3” with respect to this basis. 

(iii) Let B’ = ((0, 1 ， 0) 1 ， （ 1 ， 0, l) 1 ，（ 2,1,0) 1 ). Find the matrix P relating B to 

(iv) For which primes is B a basis of F^ 3 ? 

6. Let B and B' be two bases of the vector space F n . Prove that the matrix of change of ba¬ 
sis is P == [B ; ] _1 [B]. 

7. Let B = (viv n ) be a basis of a vector space V. Prove that one can get from B to any 
other basis B' by a finite sequence of steps of the following types: 
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(i) Replace Vi by Vi + avj, i + j\ for some a E ： F. 

(ii) Replace Vi by cvi for some c ^ 0, 

(iii) Interchange Vi and Vj. 

8. Rewrite the proof of Proposition (3.16) using the notation of Proposition (4.13). 

9. Let V = F n . Establish a bijective correspondence between the sets 涴 of bases of V and 
GL n (F). 

10. Let F be a field containing 81 elements, and let F be a vector space of dimension 3 over 
F. Determine the number of one-dimensional subspaces of V. 

11. Let F = ¥ p . 

(a) Compute the order of 5^2 (F). 

(b) Compute the number of bases of F n , and the orders of GL n (F) and SL n (F) . 

12. (a) Let Abe mm x n matrix with m < n. Prove that A has no left inverse by comparing 

A to the square n x n matrix obtained by adding (n — m) rows of zeros at the bottom, 
(b) Let B = (vi , Vm) and B' — (q’”.. ， t;„') be two bases of a vector space V. Prove 
that m = n \>y defining matrices of change of basis and showing that they are 
invertible- 


5. InBnite-Dimensional Spaces 

1* Prove that the set (w;q ， 亡 ，…) introduced in the text is linearly independent, and de- 
scribe its span. 

2* We could also consider the space of doubly infinite sequences (a) — (• •. ，， a 。 ， A，. •.）， 
with at G !R. Prove that this space is isomorphic to U 00 . 

3. Prove that the space Z is isomorphic to the space of real polynomials. 

4. Describe five more infinite - dimensional subspaces of the space 

5. For every positive integer，we can define the space i p to be the space of sequences such 

that < °°. 

(a) Prove that " is a subspace of U°°. 

(b) Prove that i p < 

6. Let V be a vector space which is spanned by a countably infinite set. Prove that every 
linearly independent subset of V is finite or countably infinite. 

7. Prove Proposition (5.7). 


6* Direct Sums 

1. Prove that the space R ny<n of all nxrt real matrices is the direct sum of the spaces of 
symmetric matrices (A = A 1 ) and of skew-symmetric matrices (A = -A 1 ). 

2. Let W be the space of nx n matrices whose trace is zero. Find a subspace W f so that 

U nXn = w ㊉ w'. 

3. Prove that the sum of subspaces is a subspace. 

4* Prove Proposition (6.5). 

5, Prove Proposition (6.6). 
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Miscellaneous Problems 


1, (a) Prove that the set of symbols {a + bi \ a, b G F 3 } forms a field with nine elements, 
if the laws of composition are made to mimic addition and multiplication of complex 
numbers. 

(b) Will the same method work for F 5 ? For F 7 ? Explain. 

*2. Let F be a vector space over an infinite field F. Prove that V is not the union of finitely 
many proper subspaces. 

*3. Let W\,W 2 be subspaces of a vector space V. The formula dim(Wi + W 2 ) — dimWi + 
dim% - dim(W t Pi W 2 ) is analogous to the formula |5i U 5 2 | = |5i | + |5 2 | — 
|*Si n S 2 \ y which holds for sets* If three sets are given，then 

\S 1 US 2 U 5 3 | = \S l \ + |5 2 | + \s 3 \ 

一 15 ^ n & I -丨 & n & 丨一 I & n 5 ^ I + 5^ n 5^ n 5 ^ 1 . 


Does the corresponding formula for dimensions of subspaces hold? 

4 . Let F be a field which is not of characteristic 2, and let jc 2 + foe + c = 0 be a quadratic 
equation with coefficients in F. Assume that the discriminant b 2 — 4c is a square in F, 
that is, that there is an element d E F such that 8 2 = b 2 — 4c. Prove that the quadratic 
formula x = (~b + 8)/2a solves the quadratic equation in F, and that if the discrimi¬ 
nant is not a square the polynomial has no root in F. 


5* (a) What are the orders of the elements 


f— — 

1 1 


2 

1 

3 

1 


of GL 2 (Up 


(b) Interpret the entries of these matrices as elements of F 7 ? and compute their orders in 
the group GL 2 (IF 7 ). 

6 . Consider the function det: F nXn - >F, where F = is a finite field with p elements 

and F nXn is the set of 2 x 2 matrices. 


(a) Show that this map is surjective. 

(b) Prove that all nonzero values of the determinant are taken on the same number of 
times. 


1. Let A be an n X n real matrix. Prove that there is a polynomial /(/) = a r t r + 
a r -\t r ~ x + + a { t + ao which has A as root, that is, such that a r A r + a r -\A r ~ l + 

-*• + a\A + aol = 0. Do this by showing that the matrices / ， A ， A 2 ”.. are linearly 
dependent. 

* 8 , An algebraic curve in IR 2 is the locus of zeros of a polynomial/(jc,y) in two variables. 
By a polynomial path in W 7 we mean a parametrized path x = x(/), y = y{t), where 
x(t), y(t) are polynomials in r. 

(a) Prove that every polynomial path lies on a real algebraic curve by showing that, for 
sufficiently large n, the functions x(t) l y(ty, 0 < /, j < n 7 are linearly dependent. 

(b) Determine the algebraic curve which is the image of the path x = t 2 + t, y = t 3 ex¬ 
plicitly, and draw it. 
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Linear Trans forma tions 


That confusions of thought and errors of reasoning 

still darken the beginnings of Algebra, 
is the earnest and just complaint of sober and thoughtful men. 

Sir William Rowan Hamilton 


l THE DIMENSION FORMULA 

The analogue for vector spaces of a homomorphism of groups is a map 

T: V — 

from one vector space over a field F to another，which is compatible with addition 
and scalar multiplication: 

(1.1) T(vi + v 2 ) — T(v\) + T(v 2 ) and T(cv) = cT (v) 3 

for all v\ ? t; 2 in V and all c E FAt is customary to call such a map a linear transfor¬ 
mation, rather than a homomorphism. However, use of the word homomorphism 
would be correct too. Note that a linear transformation is compatible with linear 
combinations: 

(1.2) ⑽ )=$ CiT{vi). 

This follows from (1.1) by induction. Note also that the first of the conditions of 

(LI) says that J is a homomorphism of additive groups V + - > W+. 

We already know one important example of a linear transformation, which is 
in fact the main example: left multiplication by a matrix, Let A be an w x n matrix 
with entries in F, and consider A as an operator on column vectors. It defines a lin¬ 
ear transformation 

/< … left mult, by A 

(1.3) F n - y —F m 

/WWWWVWW^ AX* 
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Indeed ， A{X\ + X 2 ) = AXi + AX 2 , and A(cx) = cAX. 

Another example: Let P n be the vector space of real polynomial functions of 
degree < n, of the form 

(1.4) a n x n + + ••• + fliJC + oo. 


d • 


The derivative 丁 is a linear transformation from P n to P n -\. 

ax 


Let T: y 


(1-5) 


W be any linear transformation. We introduce two subspaces 
ker T = kernel of T = {v E V | 7 1 (u) = 0} 


im 7 = image of T = {w E W | w = r (u) for some v E V}. 


As one may guess from the similar case of group homomorphisms (Chapter 2, Sec¬ 
tion 4), ker 7 is a subspace of V and im 7 is a subspace of W. 

It is interesting to interpret the kernel and image in the case that T is left mul¬ 
tiplication by a matrix A . In that case the kernel T is the set of solutions of the homo¬ 
geneous linear equation AX = 0. The image of T is the set of vectors B E F m such 
that the linear equation AX = B has a solution. 

The main result of this section is the dimension formula ，given in the next 
theorem. 


(1.6) Theorem Let T: V - >W be 3 l linear transformation, and assume that V is 

finite-dimensional. Then 

dim V = dim(ker T) + dim(im T). 

The dimensions of im T and ker T are called the rank and nullity of T, respec¬ 
tively. Thus (1.6) reads 

(1.7) dim V = rank + nullity. 

Note the analogy with the formula \G\ = \ ker | | im | for homomorphisms of 
groups [Chapter 2 (6.15)]. 

The rank and nullity of an m x n matrix A are defined to be the dimensions of 
the image and kernel of left multiplication by A. Let us denote the rank by r and the 
nullity by k. Then k is the dimension of the space of solutions of the equation 
AX = 0. The vectors B such that the linear equation AX = B has a solution form the 
image，a space whose dimension is r. The sum of these two dimensions is n. 

Let B be a vector in the image of multiplication by A ， so that the equation 
AX = B has at least one solution X = Xo. Let K denote the space of solutions of the 
homogeneous equation AX = 0, the kernel of multiplication by A. Then the set of so¬ 
lutions of AX = 5 is the additive coset X 0 + K. This restates a familiar fact: Adding 
any solution of the homogeneous equation AX = 0 to a particular solution X 0 of the 
inhomogeneous equation AX = B, we obtain another solution of the inhomogeneous 
equation. 

Suppose that A is a square nx n matrix, Ifdet A 关 0 ， then，as we know, the 
system of equations AX — B has a unique solution for every fi, because A is invert- 
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ible. In this case ，k = 0 and r = n. On the other hand，if det A = 0 then the space 
K has dimension ^ > 0- By the dimension formula ，r < n, which implies that the 
image is not the whole space F n ， This means that not all equations AX = B have so¬ 
lutions .But those that do have solutions have more than one，because the set of solu¬ 
tions of AX = fi is a coset of K. 

Proof of Theorem (1.6). Say that dim V — n. Let w*) be a basis for the 

subspace ker T, and extend it to a basis of V [Chapter 3 (3.15)]: 

(1*8) (Wi ， … ， Mjt; Vi ， … ， V n -kl 

Let Wi = T(Vi) for / = 1，…， n — t If we prove that (vvi ，， " ， 抑 ― *) = 5 is a basis 
for im T, then it will follow that im T has dimension n — k. This will prove the the¬ 
orem. 

So we must show that S spans im T and that it is a linearly independent set. Lef 
w E im 7 be arbitrary. Then w = T{v) for some v E ： V. We write v in terms of the 
basis (1-8): 

v = a\Ui + + akUk + b\v x + + b n -kV n —k ， 

and apply 7, noting that T («/) - 0: 

W = 0+ … + 0 + 办 iWi + ••• + bn-kWn-ic. 

Thus w is in the span of 5, and so S spans im T. 

Next, suppose a linear relation 

(1.9) CiVVj + + Cn-k^n-k = 0 

is given, and consider the linear combination v = c\v x + ••• + c n -kv n -k, where u ； 
are the vectors (1.8). Applying T to v gives 

T(v) = CiWi + … + Cn-k^n-k = 0. 

Thus v E ker 7 1 . So we may write i; in terms of the basis («i，...，wj of ker T 9 say 
u = + … + aid Then 

-fllWi + •_• + -QkUk + CiVi + … + Cn-kVn-k = 0. 

But (1.8) is a basis. So - a = 0，...， — 似 = 0, and ci = 0”"， (：《-* = 0. Therefore 
the relation (1.9) was trivial. This shows that S is linearly independent and com¬ 
pletes the proof. 

Z THE MATRIX OF A LINEAR TO4iYSFORM4Iia/V 

It is not hard to show that every linear transformation T\ F n - >F m is left multipli¬ 

cation by some m x n matrix A• To see this, consider the images T(ej) of the stan¬ 
dard basis vectors ej of F n . We label the entries of these vectors as follows: 

(2.1) T (ej)= 
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and we form the m 乂 n matrix A = (aij) having these vectors as its columns • We can 
write an arbitrary vector X = (x\^..,Xn) x from F n in the form X = 
e\X\ + + e n x n ， putting scalars on the right. Then 


了⑻ = 2 T (ej)Xj = 

* 

"fliT 

暑 

* 

* 

X\ + … + 

^\n 

* 

* 

X n ~ AX 




Gmn ， 



For example, the linear transformation T: IR 2 - > 1R 2 such that 


T(e,) 


2 


and T (e 2 ) 


-1 

0 


is left multiplication by the matrix 



"i -r 
-2 0- 


If X = 


X\ 

x 2 


= e\X\ + eix 2 , then 


T(X )= 



1 

0 


x 2 = 


1 - 1 


JCl 

2 0 




Xi - x 2 

2xi 


Using the notation established in Section 4 of Chapter 3, we can make a simi¬ 
lar computation with an arbitrary linear transformation T: V ― —> W, once bases of 
the two spaces are given. Let B - (ui，■•■,%) and C = (wi , …， be bases of V 
and of VK，and let us use the shorthand notation T (B) to denote the hypervector 


r(B) = (rh) ，…， 

Since the entries of this hypervector are in the vector space W, and since C is a basis 
for that space, there is an m x « matrix A such that 


(2-2) 『⑻ =C4 or (T(vx),.^,T(v n )) = A 

[Chapter 3 (4.13)]* Remember, this means that for each j, 


(2.3) T(vj) = 2 = w x a xj + + w m a mj . 

i 

So A is the matrix whose jth column is the coordinate vector of T(vj). This mx n 
matrix A = (aij) is called the matrix of T with respect to the bases B, C. Different 
choices of the bases lead to different matrices. 

In the case that V = F n , W = F m , and the two bases are the standard bases, 
A is the matrix constructed as in (2.1). 

The matrix of a linear transformation can be used to compute the coordinates 
of the image vector T (v) in terms of the coordinates of a To do this，we write v in 
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terms of the basis，say 

V = BX = V\Xl + … + V n X n . 

Then 

T(v) = T(vi)xi + … + T(v n )xn = T (B)x = CAX. 

Therefore the coordinate vector of T (v) is 

y = 

meaning that T (v) = C7. Recapitulating, the matrix A of the linear transformation 
has two dual properties: 

(2.4) T(B) = CA and Y = AX. 

The relationship between T and A can be explained in terms of the isomor¬ 
phisms if/: F n - ^ V and ij/ f : F m - > W determined by the two bases [Chapter 3 

(4.14)]. If we use 少 and ^' to identify V and W with F n and F m ， then T corresponds 
to left multiplication by A; 


T 

V - > W BX A/vwvvvvvv ^ CAY 



Going around this square in the two directions gives the same answer; 
7" 0 少 =f 0 A. 

Thus any linear transformation between finite - dimensional vector spaces V and 
W can be identified with matrix multiplication, once bases for the two spaces are 
chosen. But if we study changes of basis in V and W, we can do much better. Let us 
ask how the matrix A changes when we make other choices of bases for V and W. 
Let B ; = (tV ， …， u/) ， C'= ( 咐 '，…，…^/) be new bases for these spaces. We can 
relate the new basis B' to the old basis B by a matrix P E GL n (F), as in Chapter 3 
(4.19). Similarly ， C’ is related to C by a matrix Q E GL m (F)• These matrices have 
the following properties; 

(2.6) PX = X f and QY = Y\ 

Here X and X f denote the coordinate vectors of a vector v E V with respect to the 
bases B and and similarly Y and Y f denote the coordinate vectors of a vector 
w E W with respect to C and C'. 

Let A' denote the matrix of T with respect to the new bases, defined as above 
(2.4)，so that A f X f = Y\ Then QAP~ l X f - QAX = QY = Y f . Therefore 

(2.7) A' = QAP'K 

Note that P and Q are arbitrary invertible nx n and mx m matrices [Chapter 3 
(4.23)]. Hence we obtain the following description of the matrices of a given linear 
transformation: 
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(2.8) Proposition. Let A be the matrix of a linear transformation T with respect to 
some given bases B,C. The matrices A r which represent T with respect to other bases 
are those of the form 

A 7 = QAP\ 

where Q E GL m (F) and P E GL n (F) are arbitrary invertible matrices. □ 

Now given a linear transformation T: V - > W，it is natural to look for bases 

B，C of V and W such that the matrix of T becomes especially nice. In fact the matrix 
can be simplified remarkably. 

(2-9) Proposition* 

(a) Vector space form: Let T: V - > W be a linear transformation• Bases B, C can 

be chosen so that the matrix of T takes the form 


( 2 . 10 ) 



where l r is the r x r identity matrix, and r = rank T. 

(b) Matrix form: Given any mx n matrix A ， there are matrices Q E GL m (F) and 
P E GL n (F) so that QAP~ { has the form (2,10). 


It follows from our discussion that these two assertions amount to the same thing. To 
derive (a) from (b )， choose arbitrary bases B ， C to start with，and let A be the matrix 
of T with respect to these bases. Applying (b), we can find P, Q so that QAP~ l has the 
required form. Let = BP' 1 and C f = CQ~ l be the new bases, as in Chapter 3 
(4.22). Then the matrix of T with respect to the bases B '， C' is QAP~\ So these new 
bases are the required ones. Conversely，to derive (b) from (a) we view an arbitrary 
matrix A as the matrix of the linear transformation “left multiplication by A ”， with 
respect to the standard bases. Then (a) and (2.7) guarantee the existence of P, 0 so 
that QAP~ X has the required form. 

Note that we can interpret QAP~ { as the matrix obtained from A by a succession 
of row and column operations: We write P and Q as products of elementary ma¬ 
trices: P = and Q = E < /… E' f [Chapter 1 (2.18)]. Then QAP~ l = 

Eq -*E x f AEi~ x •••£〆• Because of the associative law，it does not matter whether 
the row operations or the column operations are done first. The equation 
(E r A)E = E f (AE) tells us that row operations commute with column operations. 

It is not hard to prove (2.9b) by matrix manipulation，but let us prove (2.9a) 
using bases instead. Let (wi， …， be a basis for ker T. Extend to a basis B for 
V: (Ui ， .•. ，〜；《 i ， … ，取 )， where r + k = n. Let w/ = T (u ; ). Then, as in the proof of 
(1.6 )， (Wi ， ••• ， uv) is a basis for im T, Extend to a basis C of W: (v^ ， ••• ， hv;jci ， ." ， a). 
The matrix of T with respect to these bases has the required form. 


□ 
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Proposition (2.9) is the prototype for a number of results which will be proved 
later. It shows the power of working in vector spaces without fixed bases (or coordi¬ 
nates) ? because the structure of an arbitrary linear transformation is related to the 
very simple matrix (2.10). It also tells us something remarkable about matrix multi¬ 
plication, because left multiplication by A on F m is a linear transformation. Namely ， 
it says that left multiplication by A is the same as left multiplication by a matrix of 
the form (2.10), but with reference to different coordinate systems. Since multiplica¬ 
tion by the matrix (2.10) is easy to describe，we have learned something new. 


3. LINEAR OPERATORS AND EIGENVECTORS 


Let us now consider the case of a linear transformation T: V - > V of a vector space 

to itself. Such a linear transformation is called a linear operator on V. Left multipli¬ 
cation by an n x n matrix with entries in F defines a linear operator on the space F n 
of column vectors- 

For example, a rotation pe of the plane through an angle 0 is a linear operator 
on 1R 2 , whose matrix with respect to the standard basis is 


(3.1) 


cos 6 -sin 6 
R = sin 0 cos 0 


To verify that this matrix represents a rotation，we write a vector X E [R in polar 

r C0S a ■_ The 


coordinates ? sls X — (r, a). Then in rectangular coordinates, X 


r sin a 


addition formulas for sine and cosine show that RX = 


r cos (a + 6) 
r sin(a + 6) 


So in polar 


coordinates, RX = (r，a + d). This shows that RX is obtained from X by rotation 
through the angle d. 

The discussion of the previous section must be changed slightly when we are 
dealing with linear operators. It is clear that we want to pick only one basis 
B = (t；i ，…， u n ) for V, and use it in place of both of the bases B and C considered in 
Section 2. In other words，we want to write 


(3.2) 


T(B) = BA 


or 




This defines the matrix A = {aij) of T. It is a square matrix whose yth column is the 
coordinate vector of T(vj) with respect to the basis B. Formula (2.4) is unchanged ， 
provided that W and C are replaced by V and B. As in the previous section, if X and 
Y denote the coordinate vectors of v and T(v) respectively，then 


(3.3) 


y = ax. 
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The new feature arises when we study the effect of a change of basis on V. 
Suppose that B is replaced by a new basis B' = (tV， …， Then formula (2.7) 
shows that the new matrix A ; has the form 

(3.4) A 7 = PAP~\ 

where P is the matrix of change of basis _ Thus the rule for change of basis in a linear 
transformation gets replaced by the following rule: 

(3.5) Proposition. Let A be the matrix of a linear operator T with respect to a ba¬ 
sis B. The matrices A f which represent T for different bases are those of the form 

A f = PAP 1 , 

for arbitrary P G GL n (F). □ 

In general，we say that a square matrix A is similar to A f if A f = PAP 1 for 
some P G GL n (F). We could also use the word conjugate [see Chapter 2 (3,4)], 
Now given A, it is natural to ask for a similar matrix A f which is particularly 
simple. One may hope to get a result somewhat like (2.10). But here our allowable 
change is much more restricted, because we have only one basis，and therefore one 
matrix P, to work with. 

We can get some insight into the problem by writing the hypothetical matrix P 
as a product of elementary matrices: P = E r “ ， E\, Then 

PAP~ } = Er^-ExAE{~ x … f， 1 . 

In terms of elementary operations，we are allowed to change A by a sequence of 
steps EAE~\ In other words, we may perform an arbitrary row operation E, 

but then we must also make the inverse column operation E 一 1 . Unfortunately, the 
row and column operations interfere with each other，and this makes the direct anal¬ 
ysis of such operations confusing, I don’t know how to use them* It is remarkable 
that a great deal can be done by another method. 

The main tools for analyzing linear operators are the concepts of eigenvector 
and invariant subspace. 

Let T: V - > V be a linear operator on a vector space. A subspace W of V is 

called an invariant subspace or a T-invariant subspace if it is carried to itself by the 
operator: 

(3.6) TW C W. 

In other words, W is T-invariant if T(w) G W for all vv E When this is so, T 
defines a linear operator on W, called the restriction of T to W. 

Let W be a 7Mnvariant subspace, and let us choose a basis B of V by appending 
some vectors to a basis (w u ...,Wk) of W: 

B = (Wi ，…， >14, Ui ， …， - 

Then the fact that W is invariant can be read off from the matrix M of T. For，the 
columns of this matrix are the coordinate vectors of the image vectors [see (2.3 )]， 
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and T (wj) is in the subspace W, so it is a linear combination of the basis (wi ， • • • ， wu). 
So when we write T(wj) in terms of the basis B，the coefficients of the vectors 
Ui ，…， Vn-k are zero. It follows that M has the block form 



M 


A 

0 


B 

D 


where A is a kx k matrix. Moreover, A is the matrix of the restriction of T to W. 

Suppose that V = Wi ㊉ W 2 is the direct sum of two r-invariant subspaces, and 
let Bi be a basis of WJ. Then we can make a basis B of K by listing the elements of Bi 
and B 2 in succession [Chapter 3 (6.6a)]. In this case the matrix of T will have the 
block diagonal form 


(3.8) 


M 


Ai 

0 


0 

A2 


where Ai is the matrix of T restricted to Wi. 

The concept of an eigenvector is closely related to that of an invariant sub¬ 
space. An eigenvector v for a linear operator T is a nonzero vector such that 

(3.9) T(v) = cv 


for some scalar c E F. Here c is allowed to take the value 0, but the vector v can 
not be zero. Geometrically，if V 二 U n , an eigenvector is a nonzero vector v such 
that v and T (v) are parallel. 

The scalar c appearing in (3.9) is called the eigenvalue associated to the eigen¬ 
vector v. When we speak of an eigenvalue of a linear operator T, we mean a scalar 
c E F which is the eigenvalue associated to some eigenvector. 

For example 5 the standard basis vector e\ is an eigenvector for left multiplica¬ 
tion by the matrix 

fete _u_ 

3 1 

0 2： 


The eigenvalue associated to the eigenvector 幻 is 3. Or，the vector (0,1,l) 1 is an ei¬ 
genvector for multiplication by the matrix 

"1 1 -1" 

A — 2 1 l 

3 0 2 
_ _ 

on the space U 3 of column vectors，and its eigenvalue is 2. 

Sometimes eigenvectors and eigenvalues are called characteristic vectors and 
characteristic values • 

Let v be an eigenvector for a linear operator T. The subspace W spanned by v is 
7 1 -invariant, because T(av) = acv E W for all a E F. Conversely, if this subspace 
is invariant，then v is an eigenvector. So an eigenvector can be described as a basis 
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of a one-dimensional r-invariant subspace. If v is an eigenvector, and if we extend it 
to a basis (v = V\,... ,v n ) of V, then the matrix of T will have the block form 



where c is the eigenvalue associated to V \. This is the block decomposition (3.7) in 
the case of an invariant subspace of dimension 1. 

When we speak of an eigenvector for an n X n matrix A, we mean a vector 
which is an eigenvector for left multiplication by A, a nonzero column vector such 
that 

AX = cX ， for some c E F. 

As before, the scalar c is called an eigenvalue. Suppose that A is the matrix of T with 
respect to a basis B, and let X denote the coordinate vector of a vector v E ： V. Then 
T (v) has coordinates AX (2.4), Hence X is an eigenvector for A if and only if v is an 
eigenvector for T. Moreover, if so, then the eigenvalues are the same: T and A have 
the same eigenvalues. 

(3.10) Corollary • Similar matrices have the same eigenvalues. 

This follows from the fact (3,5) that similar matrices represent the same linear trans¬ 
formation. □ 

Eigenvectors aren’t always easy to find, but it is easy to tell whether or not a 
given vector X is an eigenvector for a matrix A. We need only check whether or not 
AX is a multiple of X. So we can tell whether or not a given vector v is an eigenvec¬ 
tor for a linear operator T, provided that the coordinate vector of v and the matrix of 
T with respect to a basis are known. If we do this for one of the basis vectors, we 
find the following criterion: 

(3.11) The basis vector vj is an eigenvector ofT，with eigenvalue c ， 

if and only if the jth column of A has the form cej. 

For the matrix A is defined by the property T (vj) = Viaij + + v n a n j. So if 

T (u/) 二 cvj ，then ajj — c and atj = 0 if / j. □ 

(3*12) Corollary. With the above notation, 4 is a diagonal matrix if and only if 
every basis vector vj is an eigenvector. □ 

(3.13) Corollary. The matrix A of a linear transformation is similar to a diagonal 
matrix if and only if there is a basis B' = of V made up of eigenvec¬ 

tors. □ 
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This last corollary shows that we can represent a linear operator very simply 
by a diagonal matrix，provided that it has enough eigenvectors. We will see in Sec¬ 
tion 4 that every linear operator on a complex vector space has at least one eigenvec¬ 
tor, and in Section 6 that in most cases the eigenvectors form a basis. But a linear 
operator on a real vector space needn’t have an eigenvector. For example, the rota¬ 
tion pe (3.1) of the plane does not carry any vector to a parallel one, unless 0 = 0 or 
77, So pe has no eigenvector unless 0 = 0 or 77 . 

The situation is quite different for real matrices having positive entries. Such 
matrices are sometimes called positive matrices. They occur often in applications, 
and one of their most important properties is that they always have an eigenvector 
whose coordinates are positive (a positive eigenvector). Instead of proving this fact, 
let us illustrate it in the case of two variables by examining the effect of multiplica¬ 
tion by a positive 2x2 matrix A on U 2 . 

Let Wi = Aet ，The parallelogram law for vector addition shows that A sends the 
first quadrant S to the sector bounded by the vectors wi, W 2 . And the coordinate vec¬ 
tor of Wi is the ith column of A. Since the entries of A are positive, the vectors wi lie 
in the first quadrant. So A carries the first quadrant to itself: S D AS. Applying A 
again, we find AS D A 2 S y and so on: 

(3.14) S DAS D A 2 S DA 3 S D 

as illustrated below in Figure (3*15) for the matrix A = 


3 2 
1 4 ' 



(3.15) Figure • Images of the first quadrant under repeated multi¬ 
plication by a positive matrix. 

Now the intersection of a nested set of sectors is either a sector or a half line. 
In our case, the intersection Z = C\A r S turns out to be a half line. This is intuitively 
plausible, and it can be shown in various ways. The proof is left as an exercise. We 
multiply the relation Z = C\A r S on both sides by A: 

/ OO V 00 

az = A(n = n A r s = z. 


Hence Z = AZ. This shows that the nonzero vectors in Z are eigenvectors. 


□ 
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4. THE CHARACTERISTIC POLYNOMIAL 

In this section we determine the eigenvectors of an arbitrary linear operator T. Re¬ 
call that an eigenvector for T is a nonzero vector v such that 

(4.1) T(v) = cv, 

for some c in F. At first glance, it seems difficult to find eigenvectors if the matrix of 
the linear operator is complicated. The trick is to solve a different problem, namely 
to determine the eigenvalues first. Once an eigenvalue c is determined, equation 

(4.1) becomes linear in the coordinates of v, and solving it presents no problem. 

We begin by writing (4.1) in the form 

(4.2) [T - cl](v) = 0, 

where / stands for the identity operator and T 一 cl is the linear operator defined by 

(4.3) [T - cl](v) = T{v) - cv. 

It is easy to check that T — cl is indeed a linear operator. If A is the matrix of T with 
respect to some basis, then the matrix of T — cl is A — cL 
We can restate (4.2) as follows: 

(4.4) v is in the kernel ofT — cL 

(4.5) Lemma. The following conditions on a linear operator T: V - > V on a 

finite-dimensional vector space are equivalent: 

(a) ker r > 0. 

(b) imT < V. 

(c) If A is the matrix of the operator with respect to an arbitrary basis, then 
det A = 0* 

(d) 0 is an eigenvalue of T. 

Proof. The dimension formula (1.6) shows that ker T > 0 if and only if 
imT < V. This is true if and only if T is not an isomorphism, or, equivalently, if 
and only if A is not an invertible matrix. And we know that the square matrices A 
which are not invertible are those with determinant zero. This shows the equiva¬ 
lence of (a) ，（ b)，and (c). Finally，the nonzero vectors in the kernel of T are the ei¬ 
genvectors with eigenvalue zero. Hence (a) is equivalent to (d). □ 

The conditions (4.5a) and (4.5b) are not equivalent for infinite-dimensional 
vector spaces. For example, let K = [R 00 be the space of infinite row vectors 
( 出，处 ，…)， as in Section 5 of Chapter 3. The shift operator ， defined by 

(4.6) T(a u a 2 ,.^) = (0,a\ 9 a 2 ^^), 

is a linear operator on V. For this operator, ker T = 0 but im T < V. 
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(4.7) Definition. A linear operator 7 1 on a finite-dimensional vector space V is 
called singular if it satisfies any of the equivalent conditions of (4.5). Otherwise，T 
is nonsingular. 

We know that c is an eigenvalue for the operator T if and only if T — cl has a 
nonzero kernel (4,4). So, if we replace 7" by 7" — cl in the lemma above, we find: 


(4.8) Corollary. The eigenvalues of a linear operator T are the scalars c E F such 
that T — cl is singular. □ 

If A is the matrix of T with respect to some basis, then the matrix of T — cl is 
A — cl. So T — cl is singular if and only if det (A — cl) = 0. This determinant can 
be computed explicitly，and doing so provides us with a concrete method for deter¬ 
mining the eigenvalues and eigenvectors. 

Suppose for example that A is the matrix 


(4.9) 



—I 

2 

4 


whose action on IR 2 is illustrated in Figure (3,15). Then 


A — cl = 


— 

3 2 


c 

0" 


"3 - c 2 

1 4 


_0 

c 


1 4 - c„ 


and 


det (A — cl) = c 2 — 7c + 10 = (c — 5)(c — 2). 


This determinant vanishes if c = 5 or 2, so we have shown that the eigenvalues of A 
are 5 and 2. To find the eigenvectors, we solve the two systems of linear equations 
[a — 5/]x = 0 and [A - 2/]x = 0. The solutions are unique up to scalar factor: 

(4.10) ui = [jl, t> 2 = 2 \ 


Note that the eigenvector v>\ with eigenvalue 5 is in the first quadrant. It lies on the 
half line Z which is illustrated in Figure (3.15). 

We now make the same computation with an arbitrary matrix. It is convenient 
to change sign. Obviously det(c/ — a) = 0 if and only if det(A — cl) = 0. Also, it 
is customary to replace the symbol c by a variable t. We form the matrix tl — A: 


(4,11) 


tl ^ A — 


(t- an) 一 ai2 • . . - am 

一 fl21 G — 叱 2) ... 一 U2n 

• • 

• * 

• * 

-a n \ . (卜 dnn) 


Then the complete expansion of the determinant [Chapter 1 (4 、 11)] shows that 
det(f/ - A) is a polynomial of degree n in f, whose coefficients are scalars. 
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(AA2) Deftnition, The characteristic polynomial of a linear operator T is the poly¬ 
nomial 

p(t) = det(r/ - A), 

where A is the matrix of T with respect to some basis. 

The eigenvalues of T are determined by combining (4.8) and (4.12): c is an ei¬ 
genvalue if and only if p(c) = 0. 

(4.13) Corollary^ The eigenvalues of a linear operator are the roots of its charac¬ 
teristic polynomial- □ 

(4* 14) Corollary. The eigenvalues of an upper or lower triangular matrix are its 
diagonal entries. 

Proof. If A is an upper triangular matrix, then so is tl — A. The determinant 
of a triangular matrix is the product of its diagonal entries, and the diagonal entries 
of tl — A are t — an ， Therefore the characteristic polynomial is p(t )= 
(t — an)(t — an) •••(，一 a nn )， and its roots，the eigenvalues，are Cl\ \ ， . * _ ? dfiti • n 


We can compute the characteristic polynomial of an arbitrary 2x2 matrix 


without difficulty. It is 



(4.15) 


det(r/ — a) = det 


卜 a 
—c 



(a + d)t + (ad — be). 


The discriminant of this polynomial is 

(4.16) (a + d) 2 - 4(ad — be) = (a — d) 2 + Abe. 


If the entries of A are positive real numbers，then the discriminant is also positive ， 
and therefore the characteristic polynomial has real roots, as predicted at the end of 
Section 3. 


(4.17) Proposition. The characteristic polynomial of an operator T does not de¬ 
pend on the choice of a basis. 

Proof. A second basis leads to a matrix A f = PAP -1 [see (3,4)]* We have 
tl - A f = tl - PAP 1 = P(tl)P~ l - PAP~ l = P(tl - A)P~ l . 

Thus 

det(f/ — A f ) = det(P ("— A)p l ) = det P det(r/ - A)det P^ x = det(r/ - A). 

So the characteristic polynomials computed with A and A f are equal，as was as¬ 
serted, □ 
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(4.18) Proposition. The characteristic polynomial pit) has the form 

p(t) = t n — (tr A)t n ~ l + {intermediate terms) + (-l) rt (det A), 
where tr A, the trace of A, is the sum of the diagonal entries: 

tr A = flu + an 十 … + a n n* 

All coefficients are independent of the basis. For instance tr PAP~ l = trA* 

This is proved by computation. The independence of the basis follows from (4.17). □ 

Since the characteristic polynomial，the trace, and the determinant are # inde¬ 
pendent of the basis, they depend only on the operator T. So we may define the 
terms characteristic polynomial, trace, and determinant of a linear operator T to be 
those obtained using the matrix of T with respect to an arbitrary basis. 

(4*19) Proposition. Let Tbe a linear operator on a finite-dimensional vector space V. 

(a) If V has dimension n, then T has at most n eigenvalues* 

(b) If F is the field of complex numbers and K 竽 0, then T has at least one eigen¬ 
value, and hence it has an eigenvector. 

Proof • 

(a) A polynomial of degree n can have at most n different roots. This is true for 
any field F, though we have not proved it yet [see Chapter 11 ， (L8)]* So we 
can apply (4,13), 

(b) Every polynomial of positive degree with complex coefficients has at least one 
complex root. This fact is called the Fundamental Theorem of Algebra* There 
is a proof in Chapter 13 (9.1). □ 

For example, let A be the rotation (3.1) of the real plane R 2 by an angle 8. Its 
characteristic polynomial is 

(4.20) p(t) = t 2 — (2 cos 6)t + 1, 

which has no real root unless cos 0 = ±1. But if we view A as an operator on C 2 , 
there are two complex eigenvalues. 


5 . ORTHOGONAL MATRICES A1SD ROTATIONS 

In this section we describe the rotations of two- and three-dimensional spaces 1R 2 
and U 3 about the origin as linear operators. We have already noted (3.1) that a rota¬ 
tion of IR 2 through an angle 6 is represented as multiplication by the matrix 

— 

cos 6 -sin 6 
sin 6 cos 6 
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A rotation of U 3 about the origin can be described by a pair (v,6) consisting of a unit 
vector v, a vector of length 1 ， which lies in the axis of rotation, and a nonzero angle 
0, the angle of rotation. The two pairs {v y 0) and (一 t ;， 一 0) represent the same rota¬ 
tion. We also consider the identity map to be a rotation, though its axis is indetermi¬ 
nate. 



(5,1) Figure. 


The matrix representing a rotation through the angle 6 about the vector e\ is 
obtained easily from the 2x2 rotation matrix. It is 

] 0 0 

( 5 . 2 ) A = 0 cos 6 —sin 0 

^0 sin 6 cos 6 

Multiplication by A fixes the first coordinate X\ of a vector and operates by rotation 
on fe ， JC 3 )、All rotations of R 3 are linear operators, but their matrices can be fairly 
complicated. The object of this section is to describe these rotation matrices. 

A real nX n matrix A is called orthogonal if A 1 = A~\ or, equivalently, if 
A l A = /, The orthogonal nX n matrices form a subgroup of GL n (U) denoted by O n 
and called the orthogonal group: 

(5.3) On = {A E ： GL n (U) I A 1 A ― /}. 

The determinant of an orthogonal matrix is ±1， because if A X A = 7 ? then 

(det A) 2 = (det ^^(det A) = 1. 

The orthogonal matrices having determinant +1 form a subgroup called the special 
orthogonal group and denoted by SO n : 

(5.4) SO n = {A E GL n (U) I A 1 A = I, det A = 1}. 

This subgroup has one coset in addition to SO n , namely the set of elements with de¬ 
terminant —1. So it has index 2 in 

The main fact which we will prove about rotations is stated below: 

(5.5) Theorem, The rotations of U 2 or [R 3 about the origin are the linear operators 
whose matrices with respect to the standard basis are orthogonal and have determi¬ 
nant L In other words, a matrix A represents a rotation of U 2 (or U 3 ) if and only if 
A E S0 2 (or S0 3 ), 

Note the following corollary: 
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(5.6) Corollary. The composition of two rotations of U 3 about the origin is also a 
rotation. 

This corollary follows from the theorem because the matrix representing the compo¬ 
sition of two linear operators is the product matrix，and because SCh ，being a sub¬ 
group of GZ> 3 ([R)，is closed under products. It is far from obvious geometrically. 
Clearly，the composition of two rotations about the same axis is also a rotation about 
that axis. But imagine composing rotations about different axes. What is the axis of 
rotation of the composed operator? 

Because their elements represent rotations, the groups SO 2 and SO 3 are called 
the two- and three-dimensional rotation groups. Things become more complicated in 
dimension > 3. For example，the matrix 

cos 0 - sin 0 
sin 6 cos 6 

cos 17 — sin 17 
sin 17 cos 7 ] 

is an element of SO 4 ^ Left multiplication by this matrix is the composition of a rota¬ 
tion through the angle 6 on the first two coordinates and a rotation through the angle 
7 ] on the last two. Such an operation can not be realized as a single rotation. 

The proof of Theorem (5.5) is not very difficult，but it would be clumsy if we 
did not first introduce some terminology. So we will defer the proof to the end of 
the section. 

To understand the relationship between orthogonal matrices and rotations，we 
will need the dot product of vectors. By definition, the dot product of column vec¬ 
tors X and Y is 

(5.8) (X _ y) = xiyi + 又 2 % + … + x n y n - 

It is sometimes usefiil to write the dot product in matrix form as 

(5.9) (x • y) = x 1 Y. 

There are two main properties of the dot product of vectors in 1R 2 and M 3 . The 
first is that (X - X) is the square of the length of the vector: 

x \ 2 = x\ 2 + X 2 or x\ 2 + x 2 2 + x 3 2 , 

according to the case. This property, which follows from Pythagoras’s theorem, is 
the basis for the definition of length of vectors in R n : The length of X is defined by 
the formula 

(5.10) |x | 2 = (X • X) = J ^ 2 + … + jc« 2 . 

The distance between two vectors X, Y is defined to be the length \x - Y\ofX — Y. 
The second important property of dot product in R 2 and 1R 3 is the formula 

(5.11) (X - y) = |X| \y\ cos 6, 
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where 6 is the angle between the vectors. This formula is a consequence of the law 
of cosines 

c 2 = a 2 + b 2 — lab cos 9 

for the side lengths a, b,c of a triangle, where 9 is the angle subtended by the sides 
a,b. To derive (5.11)，we apply the law of cosines to the triangle with vertices 
O y X,Y. Its side lengths are | X | ? | 71 and | X — 71, so the law of cosines can be written 
as 

{X - Y ^ X - Y) = {x ^ X) + {Y ^ Y) - 2\x\ |y| cos 0. 

The left side expands to 

(X - Y ^ X - Y) = (x * X) ~ 2(x ^ Y) + (Y • Y), 

and formula (5,11) is obtained by comparing terms. 

The most important application of (5,11) is that two vectors X and Y are or¬ 
thogonal, meaning that the angle 9 is 7r/2, if and only if (X • y) = 0. This property 
is taken as the definition of orthogonality of vectors in U n : 

(5.12) X is orthogonal to Y if (x * Y) = 0. 

(5.13) Proposition. The following conditions on a real nx n matrix A are equiva¬ 
lent: 

(a) A is orthogonal. 

(b) Multiplication by A preserves dot product，that is, (AY • AY) = (x * y) for all 
column vectors X y Y. 

(c) The columns of A are mutually orthogonal unit vectors. 

A basis consisting of mutually orthogonal unit vectors is called an orthonormal 
basis. An orthogonal matrix is one whose columns form an orthonormal basis. 

Left multiplication by an orthogonal matrix is also called an orthogonal opera¬ 
tor. Thus the orthogonal operators on U n are the ones which preserve dot product. 

Proof of Proposition (5.13). We write (x ^ Y) = X l Y. If A is orthogonal, then 
A 1 A = /， so 

(X * Y) = X l Y ^ X^A^Y = (AX) l (AY) = (AX ♦ AY). 

Conversely, suppose that X l Y = X l A { AY for all X and Y. We rewrite this equality as 
X l BY = 0, where B = I - A 1 A. For any matrix B, 

(5.14) et Bej = bij. 

So if X l BY = 0 for all X, V, then e^Bej = bij = 0 for all ij, and B — 0. Therefore 
I = A 1 A. This proves the equivalence of (a) and (b). To prove that (a) and (c) are 
equivalent，let A/ denote the yth column of the matrix A. The (ij) entry of the 
product matrix A 1 A is (Ai - Aj). Thus A { A = 7 if and only if (A t - A/) = 1 for all i. 
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and (A/ • Aj) = 0 for all i ^ j, which is to say that the columns have length 1 and are 
orthogonal. □ 

The geometric meaning of multiplication by an orthogonal matrix can be ex¬ 
plained in terms of rigid motions. A rigid motion or isometry of U n is a map 
m: - > U n which is distance preserving; that is, it is a map satisfying the follow¬ 

ing condition: If X, 7 are points of U n , then the distance from X to 7 is equal to the 
distance from m(X) to m(y): 

(5*15) |m(X) — m ⑺ I = \x — y\. 

Such a rigid motion carries a triangle to a congruent triangle, and therefore it pre¬ 
serves angles and shapes in general. 

Note that the composition of two rigid motions is a rigid motion, and that the 
inverse of a rigid motion is a rigid motion. Therefore the rigid motions of U n form a 
group M n , with composition of operations as its law of composition. This group is 
called the group of motions • 

(5.16) Proposition• Let m be a map U n - > U n . The following conditions on m 

are equivalent: 

(a) m is a rigid motion which fixes the origin, 

(b) m preserves dot product; that is，for all X,Y E ： (m(X) • m(Y)) = (X - y). 

(c) m is left multiplication by an orthogonal matrix, 

(5.17) Corollary. A rigid motion which fixes the origin is a linear operator. 

This follows from the equivalence of (a) and (c). 

Proof of Proposition (5.16). We will use the shorthand ' to denote the map m, writ¬ 
ing m(x) = X F • Suppose that m is a rigid motion fixing 0, With the shorthand nota¬ 
tion, the statement (5.15) that m preserves distance reads 

(5.18) (X f - Y f * X f - Y f ) = (X - Y • X - Y) 

for all vectors X, Y. Setting 7=0 shows that (x f - X f ) = (X • X) for all X. We ex¬ 
pand both sides of (5.18) and cancel (X - X) and (Y * Y), obtaining (x f * Y f )= 
(x - Y). This shows that m preserves dot product，hence that (a) implies (b). 

To prove that (b) implies (c)，we note that the only map which preserves dot 
product and which also fixes each of the basis vectors et is the identity. For, if m 
preserves dot product, then (x * ej) = (x f - e/) for any X. If e/ = ej as well, then 

Xj = (X - ej) - {x ! ' e/) = (X r - ej) = x/ 

for all j. Hence X ~ X\ and m is the identity. 

Now suppose that m preserves dot product. Then the images e\ e n f of the 
standard basis vectors are orthonormal: (e/ . e/) = 1 and (e/ - e/) — 0 if i ^ j. 
LetB ; = (e\ and let A = [B']. According to Proposition (5.13), A is an or- 
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thogonal matrix. Since the orthogonal matrices form a group, A 1 is also orthogonal. 
This being so, multiplication by A~ x preserves dot product too. So the composed mo¬ 
tion A _1 m preserves dot product，and it fixes each of the basis vectors 幻 ， Therefore 
A~ x m is the identity map. This shows that m is left multiplication by A ， as required. 
Finally, if m is a linear operator whose matrix A is orthogonal, then 
X f — Y f ~ {X ~ Y) f because m is linear, and \ x f — Y f \ = \ (x — Y) f | = |x — y\ by 
(5.13b), So m is a rigid motion. Since a linear operator also fixes 0, this shows that 
(c) implies (a). □ 

One class of rigid motions which do not fix the origin，and which are therefore 
not linear operators, is the translations. Given any fixed vector b = in 

translation by b is the map 

■ ■ . 

X\ + b\ 

(5.19) tb(x) = X + b = •• • 

* 

Xn bn 

This map is a rigid motion because tb(x) — tb(Y) = (X + b) — {Y + b) — X — Y y 
and hence | tb{x) - tb(Y) | = |x - y\. 

(5.20) Proposition. Every rigid motion m is the composition of an orthogonal lin¬ 
ear operator and a translation. In other words, it has the form m(X) = AX + b for 
some orthogonal matrix A and some vector b. 

Proof. Let b = m(0). Then t-b(b) = 0, so the composed operation t-bm is a 
rigid motion which fixes the origin: ^b(m(0)) = 0. According to Proposition (5.16), 
t-btn is left multiplication by an orthogonal matrix A: t-bm(x) = AX. Applying tt> to 
both sides of this equation, we find m{x) = AX + b. 

Note that both the vector b and the matrix A are uniquely determined by m, be¬ 
cause b = m(0) and A is the operator □ 

Recall that the determinant of an orthogonal matrix is ± 1, An orthogonal op¬ 
erator is called orientation-preserving if its determinant is +1， and orientation- 
reversing if its determinant is —1. Similarly, let m be a rigid motion* We write 
m(x) — AX -h b as above. Then m is called orientation-preserving if det A = 1, and 
orientation-reversing if det A = -1. A motion of U 2 is orientation-reversing if it 
flips the plane over, and orientation-preserving if it does not. 

Combining Theorem (5*5) with Proposition (5*16) gives us the following char¬ 
acterization of rotations: 

(5.21) Corollary. The rotations of U 2 and U 3 are the orientation-preserving rigid 
motions which fix the origin. □ 

We now proceed to the proof of Theorem (5.5)，which characterizes the rota¬ 
tions of U 2 and U 3 about the origin. Every rotation p is a rigid motion, so Proposi- 
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tion (5.16) tells us that p is multiplication by an orthogonal matrix A. Also, the de¬ 
terminant of A is 1. This is because det A = ± 1 for any orthogonal matrix，and 
because the determinant varies continuously with the angle of rotation. When the 
angle is zero, A is the identity matrix，which has determinant 1. Thus the matrix of a 
rotation is an element of S0 2 or S0 3 - 

Conversely, let A E S0 2 be an orthogonal 2x2 matrix of determinant 1. Let 
v\ denote the first column Aei of A. Since A is orthogonal, Vi is a unit vector. There 
is a rotation R (3.1) such that Re\ = v { too. Then B = R~ l A fixes ^i. Also, A and R 
are elements of S0 2y and this implies that B is in S0 2 . So the columns of B form an 
orthonormal basis of [R 2 , and the first column is . Being of length 1 and orthogo¬ 
nal to e\ ? the second column must be either e 2 or ~e 2 , and the second case is ruled 
out by the fact that det = 1. It follows that B = I and that A = R. So A is a 
rotation. 

To prove that an element A of SO 3 represents a rotation ， we’d better decide on 
a definition of a rotation p of U 3 about the origin. We will require the following: 

( 5 . 22 ) 

(i) p is a rigid motion which fixes the origin; 

(ii) p also fixes a nonzero vector v; 

(iii) p operates as a rotation on the plane P orthogonal to v. 

According to Proposition (5.16), the first condition is equivalent to saying that p is 
an orthogonal operator. So our matrix A E ： SO 3 satisfies this condition. Condition 
(ii) can be stated by saying that v is an eigenvector for the operator p, with eigen¬ 
value 1. Then since p preserves orthogonality，it sends the orthogonal space P to it¬ 
self. In other words, P is an invariant subspace. Condition (iii) says that the restric¬ 
tion of p to this invariant subspace is a rotation* 

Notice that the matrix (5,2) does satisfy these conditions, with v = 

(5*23) Lemma. Every element A E SO 3 has the eigenvalue L 

Proof. We will show that det(A — /) = ()• This will prove the lemma [see 
(4.8)]. This proof is tricky，but efficient. Recall that det A = det A 1 for any matrix A , 
so det A 1 = l. Since A is orthogonal, A l (A — /) = (/— Af. Then 

det(A — /) = det A l (A — /) = det (/ — A) 1 = det (/ — A). 

On the other hand, for any 3x3 matrix 5 ， det(-5) = -det 5. Therefore 
det (A — /) = -det (/ — A), and it follows that det(A — /) = 0, □ 

Now given a matrix A E ： SO 3 , the lemma shows that left multiplication by A 
fixes a nonzero vector v \. We normalize its length to 1 ， and we choose orthogonal 
unit vectors v 2 , V 3 lying in the plane P orthogonal to ui. Then B = vi , ^ 3 ) is an 
orthonormal basis of W. The matrix P = [B] 1 is orthogonal because [b] is orthogo- 
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nal，and A' = PAP 1 represents the same operator as A does，with respect to the basis 
B. Since A and P are orthogonal, so is A\ Also det A* = det A = l. So A f E ： SO 3 . 

Since v\ is an eigenvector with eigenvalue 1， the first column of A’ is q • Since 
A f is orthogonal 5 the other columns are orthogonal to 幻 ， and A f has the block form 


"1 

0’ 

一 0 

R 


Using the fact that A' G SO 3 , one finds that/? E S0 2 . So /? is a rotation. This shows 
that has the form (5.2) and that it represents a rotation. Hence A does too. This 
completes the proof of Theorem (5.5). □ 

(5.24) Note. To keep the new basis separate from the old basis, we denoted it by B’ 
in Chapter 3. The prime is not needed when the old basis is the standard basis, and 
since it clutters the notation, we will often drop it, as we did here. 


6. DIAGONALIZA TION 

In this section we show that for “most” linear operators on a complex vector space ， 
there is a basis such that the matrix of the operator is diagonal. The key fact，which 
we already noted at the end of Section 4, is that every complex polynomial of posi¬ 
tive degree has a root. This tells us that every linear operator has an eigenvector, 

(6.1) Proposition, 

(a) Vector space form: Let 7 1 be a linear operator on a finite-dimensional complex 
vector space V. There is a basis B of V such that the matrix A of J is upper tri- 
angular. 

(b) Matrix form: Every complex nX n matrix A is similar to an upper triangular 
matrix. In other words, there is a matrix P G GL n (C) such that PAP' 1 is upper 
triangular. 

Proof• The two assertions are equivalent, because of (3.5). We begin by ap¬ 
plying (4.19b)，which shows the existence of an eigenvector, call it v\ . Extend to a 
basis = (ui’ ， … ， v n f ) for V. Then by (3.11), the first column of the matrix A f of T 
with respect to B' will be (c! ， () ， ••• ， 0)、where ci is the eigenvalue of tV. Therefore 
A' has the form 



Ci 



0 


A 1 = 

* 

* 

t 

B 


0 
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where B is an (n — 1) x (n — 1) matrix. The matrix version of this reduction is this: 
Given any nX n matrix A, there is a 尸 E GL n (C) such that A f = PAP' 1 has the 
above form. Now apply induction on n. By induction，we may assume that the exis¬ 
tence of some Q E GL n -\(C) such that QBQ { is triangular has been proved. Let Q { 
be the n 乂 n matrix 


1 

0 … 0 

0 

* 

* 

Q 

* 

0 



Then 

(QiP)A(QiP)~ l = QiiPAP^Qr 1 = QiA f Qi~ l 

has the form 


C \ 

氺 • ••氺 

0 


• 

蟄 

* 

QBQ- 1 

0 



which is triangular. □ 

As we mentioned, the important point in the proof is that every complex poly¬ 
nomial has a root. The same proof will work for any field F, provided that all the 
roots of the characteristic polynomial are in the field. 

(6.2) Corollary. Let F be a field. 

(a) Vector space form: Let 7 1 be a linear operator on a finite-dimensional vector 
space V over F, and suppose that the characteristic polynomial of T factors into 
linear factors in the field F. Then there is a basis B of V such that the matrix A 
of T is triangular. 

(b) Matrix form: Let A be an n X n matrix whose characteristic polynomial factors 
into linear factors in the field F. There is a matrix P E ： GL n {F) such that PAP~ l 
is triangular. 

Proof. The proof is the same，except that to make the induction step one has 
to check that the characteristic polynomial of the matrix B is — ci), where 

p (t) is the characteristic polynomial of A . This is true because p (t) is also the charac¬ 
teristic polynomial of A f (4.17), and because det (r/ — A ; ) = (t — ci)det(f/ — B). 
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So our hypothesis that the characteristic polynomial factors into linear factors carries 
over from A to 5. □ 

Let us now ask which matrices A are similar to diagonal matrices. As we saw 
in (3.12), these are the matrices A which have a basis of eigenvectors* Suppose 
again that F = C, and look at the roots of the characteristic polynomial p(t). Each 
root is the eigenvalue associated to some eigenvector, and an eigenvector has only 
one eigenvalue. Most complex polynomials of degree n have n distinct roots. So 
most complex matrices have n eigenvectors with different eigenvalues, and it is rea¬ 
sonable to suppose that these eigenvectors may form a basis，This is true. 

(6.3) Proposition • Let u! ，…， G V be eigenvectors for a linear operator T, with 
distinct eigenvalues Ci，.，•，£>, Then the set (ui ， … ， tv) is linearly independent. 

Proof. Induction on r: Suppose that a dependence relation 

0 = aiV] + *** + a r v r 

is given. We must show that at = 0 for all /， and to do so we apply the operator T : 

0 = T ⑼ = fli7(ui) + … + a r T (v r ) = a\C\V\ + + a r c r v r . 

This is a second dependence relation among We eliminate v r from the 

two relations, multiplying the first relation by c r and subtracting the second: 

0 = a](c r - C[)Vi + + a r -i(c r - Cr^[)v r -\. 

Applying the principle of induction，we assume that ( 仍， … ， u r -i) are independent* 
Then the coefficients a\(c r — cy) y ...,a r -\(c r — c r _ { ) are all zero. Since the c;’s are 
distinct, c r — c ( ^ 0 if / < r. Thus fli = …二 a r -\ = 0, and the original relation 
is reduced to 0 = a r v r . Since an eigenvector can not be zero, a r = 0 too. □ 

The next theorem follows by combining (3.12) and (6.3): 


(6.4) Theorem* Let 7 1 be a linear operator on a vector space V of dimension n over 
a field F. Assume that its characteristic polynomial has n distinct roots in F. Then 
there is a basis for V with respect to which the matrix of T is diagonal. □ 


Note that the diagonal entries are determined, except for their order, by the 
linear operator T. They are the eigenvalues. 

When p(t) has multiple roots, there is usually no basis of eigenvectors, and it 
is harder to find a nice matrix for J, The study of this case leads to what is called the 
Jordan canonical form for a matrix, which will be discussed in Chapter 12. 

As an example of diagonalization，consider the matrix 
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whose eigenvectors were computed in (4.10). These eigenvectors form a basis 
B = (ui ? v 2 ) of U 2 . According to [Chapter 3 (4.20), see also Note (5,24)]，the matrix 
relating the standard basis E to this basis B is 


(6.5) P = [B]— 1 = 

and PAP' 1 = A' is diagonal: 



( 6 . 6 ) 





_ 


The general rule is stated in Corollary (6*7); 


(6.7) Corollary. If a basis B of eigenvectors of A in is known and if P = [B] 一、 
then A f = PAP~ X is diagonal. □ 


The importance of Theorem (6.4) comes from the feet that it is easy to com¬ 
pute with diagonal matrices. For example, if A f = PAP~ l is diagonal, then we can 
compute powers of the matrix A using the formula 

( 6 , 8 ) A k = {P~ x A f P) k = P^ l A fk P. 

Thus if A is the matrix (4.9)，then 

5 * + 2 • 2 * 2 ( 5 * - 2 k )" 

5 * - 2 k 2 • 5 * + 2 * * 



Z SYSTEMS OF DIFFERENTIAL EQUATIONS 


We learn in calculus that the solutions to the first-order linear differential equation 


(7.1) 


dx 

ir ax 


are x(t) = ce a \ c being an arbitrary constant. Indeed, ce at obviously solves (7.1), 
To show that every solution has this form，let x(f) be an arbitrary differentiable 
function which is a solution. We differentiate e~ at x{t) using the product rule: 

j 

—= ~ae' at x(t) + e~ at ax{t) = 0, 

Thus e~ at x(t) is a constant c，and x{t) = ce at . 

As an application of diagonalization，we will extend this solution to systems of 
differential equations. In order to write our equations in matrix notation, we use the 
following terminology. A vector-valued function x(t) is a vector whose entries are 
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functions of t. Similarly，a matrix-valued function A(t) is a matrix whose entries are 


functions: 

(7.2) 

x(t)= 

X\ (t) 

• 

ft 

， Mt) = 

Cln(t) 

畢 

* 

♦ 

… a [n (t) 

* 

* 

* 



(Oj 

0^m\ (f) 



The calculus operations of taking limits ， differentiating, and so on are ex¬ 
tended to vector-valued and matrix-valued functions by performing the operations on 
each entry separately. Thus by definition 


(7.3) 


ft 


limx(r)= 


Sn 


where 右二 lim x,(r). 


So this limit exists if and only if lim Xi{t) exists for each /• Similarly, the derivative 
of a vector-valued or matrix-valued function is the function obtained by differentiat¬ 
ing each entry separately: 





flu’W … Cl\n{t) 

dX — 

* 

dA 

■ * 

* * 

_ 

dt 

* 

*- 

Xn{t) 

， dt 

* * 

0^m \ f it ) "• Clmn (0 

■ - 


where Xi{t) is the derivative of Xi{t), and so on. So dx/dt is defined if and only if 
each of the functions Xi{t) is differentiable. The derivative can also be described in 
vector notation, as 


(7,4) 


dt 


lim 


x(t + h) - X(t) 
h 


Here x(t + h) — x(t) is computed by vector addition and the h in the denominator 
stands for scalar multiplication by h 一 1 • The limit is obtained by evaluating the limit 
of each entry separately，as above. So the entries of (7,4) are the derivatives 
The same is true for matrix-valued functions. 

A system of homogeneous first-order linear, constant-coefficient differential 
equations is a matrix equation of the form 


(7,5) 


dX 

dt 


=AX, 


where A is an n x n real or complex matrix and x(t) is an n-dimensional vector¬ 
valued function. Writing out such a system, we obtain a system of n differential 
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3x\ + 2 x 2 

X\ + 4x2. 


The simplest systems (7,5) are those in which A is a diagonal matrix. Let the 
diagonal entries be at. Then equation (7,6) reads 


(7.8) 


dxi 

dt 




Here the unknown functions Xi are not mixed up by the equations, so we can solve 
for each one separately: 

(7.9) xi = Cie ai \ 


a^t 


for some constant a. 

The observation which allows us to solve the differential equation (7.5) in most 
cases is this: If v is an eigenvector for A with eigenvalue a, then 

(7.10) X = e at v 

is a particular solution of (7,5). Here e at v is to be interpreted as the scalar product 
of the function e at and the vector v. Differentiation operates on the scalar function ， 
fixing the constant vector u, while multiplication by A operates on the vector v, 
fixing the scalar function e at . Th\isf t e at v = ae at v = Ae at v. For example ， (2, 一 1)’ is 


an eigenvector with eigenvalue 2 of the matrix 


， and 


2e 


sample 

2tl 

2t soh 


solves the sys- 


tem of differential equations (7,7), 

This observation allows us to solve (7.5) whenever the matrix A has distinct 
real eigenvalues. In that case every solution will be a linear combination of the spe¬ 
cial solutions (7.10), To work this out, it is convenient to diagonalize. Let us replace 


equations, of the form 


dx\ 

dt 


anXi(t) + •" + a in Xn(t) 


(7.6) 


dXn 

dt 


a n \X\ (t) + … + ^nn 


The Xi{t) are unknown functions，and the mj are scalars. For example, if we sub- 

[3 2I . 

stitute the matrix j ^ for A, (7,5) becomes a system of two equations in two 

unknowns: 
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the notation ; used in the previous section by " here，to avoid confusion with differ- 
entiation. Let P be an invertible matrix such that PAP~ l = A is diagonal. So 
P = [B]— 1 ， where B is a basis of eigenvectors. We make the linear change of variable 

(7.11) X = p l X. 


Then 

(7,12) 


dt dt' 


Substituting into (7.5), we find 


(7.13) 


-It = PAP x= ^ 


Since A is diagonal, the variables xt have been separated, so the equation can be 
solved in terms of exponentials. The diagonal entries of A are the eigenvalues 
Ai”._，of so the solution of the system (7,13) is 

(7.14) xt — Cie Xi \ for some q. 

Substituting back, 

(7.15) X = P~ l x 

solves the original system (7*5). This proves the following: 

(7.16) Proposition. Let A be an /i x n matrix，and let P be an invertible matrix 
such that PAP 1 = A is diagonal, with diagonal entries Ai ， … ， A … The general solu- 

, dx 

tion of the system ~ = AX is X = P~ ] X, where = cie for some arbitrary con¬ 
stants a. □ 


The matrix which diagonalizes A in example (7*7) was computed in (6.5): 


a i7) 

Thus 


(7.18) 



Xi 


'1 2 


C\€ St 


' Cl e 5t + 2c 2 e 2r 

x 2 


1-1 


c 2 e 2t 


S\e 5t — c 2 e' 


In other words，every solution is a linear combination of the two basic solutions 


Xi 

x 2 


e 


e 


st 


5t 


and 


Xi 

X 2 


「2, 
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These are the solutions (7.10) corresponding to the eigenvectors (1，iy and (2, — 1 )、 
The coefficients a appearing in these solutions are arbitrary. They are usually deter¬ 
mined by assigning initial conditions ， meaning the value of X at some particular to . 

Let us now consider the case that the coefficient matrix A has distinct eigenval¬ 
ues, but that they are not all real. To copy the method which we used above, we 
must first consider differential equations of the form (7,1), in which a is a complex 
number. Properly interpreted, the solutions of such a differential equation still have 
the form ce at . The only thing to remember is that e at will now be a complex-valued 
function of t. In order to focus attention, we restrict the variable t to real values 
here，although this is not the most natural choice when working with complex-valued 
functions. Allowing t to take on complex values would not change things very 
much. 

The definition of the derivative of a complex-valued function is the same as for 
real-valued functions: 


(7,19) 


dx 


dt 


v x(t + h) - x(t) 

Iim - - - 

A—o h 


provided that this limit exists. There are no new features. We can write any such 
function x(t) in terms of its real and imaginary parts, which will be real-valued 
functions; 


(7,20) x(t) = u(t) + iv {t). 

Then x is differentiable if and only if u and v are differentiable, and if they are, the 
derivative of x is x f = u f + iv f . This follows directly from the definition. The 
usual rules for differentiation, such as the product rule，hold for complex-valued 
functions. These rules can be proved by applying the corresponding theorem for real 
functions to u and v, or else by carrying the proof for real functions over to the com¬ 
plex case. 

Recall the formula 


(7.21) 


e r+si = e r (cos 5 + i sin s). 


Differentiation of this formula shows that de at /dt = ae at for all complex numbers 
a = r + si. Therefore ce at solves the differential equation (7.1), and the proof 
given at the beginning of the section shows that these are the only solutions. 

Having extended the case of one equation to complex coefficients, we can now 
use the method of diagonalization to solve a system of equations (7.5) when A is an 
arbitrary complex matrix with distinct eigenvalues* 


For example, let A 


一 1 1 


The vectors v x 


and v 2 


are eigenvectors, with eigenvalues 1 + i and 1 — i respectively. Let B 
According to (6.7), A is diagonalized by the matrix P 9 where 




L —I 

(VuVi). 


(7.22) 


P™ 1 = [B] 
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Formula (7.14) tells us that X 




C\€ t+lt 

X 2 

— 


C2e x ~ lt 

kH 譬 


The solutions of (7.5) are 


(7.23) 


•— — 

X\ 

_i ^ 



=P l X = 

_ic,e t+it + c 2 e 卜、 


where ci ， C 2 are arbitrary complex numbers. So every solution is a linear combina¬ 
tion of the two basic solutions 


(7.24) 


e 

ie 


t+it 


t+it 


and 


ie 

e 


一 it 


t-it 


However，these solutions are not completely satisfactory, because we began with a 
system of differential equations with real coefficients, and the answer we obtained is 
complex. When the original matrix is real，we want to have real solutions• We note 
the following lemma; 


(7.25) Lemma « Let A be a real n x n matrix, and let X(t) be a complex-valued so¬ 
lution of the differential equation (7.5). The real and imaginary parts of X(t) 
solve the same equation. □ 


Now every solution of the original equation (7.5 )，whether real or complex, 
has the form (7.23) for some complex numbers a. So the real solutions are among 
those we have found. To write them down explicitly, we may take the real and imag¬ 
inary parts of the complex solutions. 

The real and imaginary parts of the basic solutions (7.24) are determined using 
(7.21). They are 


(7,26) 


e r cos t 
-e f sin t_ 


and 


e l sin t 
e f cos t 


Every real solution is a real linear combination of these particular solutions* 


8. THE MATRIX EXPONENTIAL 


Systems of first-order linear, constant-coefficient differential equations can also be 
solved formally, using the matrix exponential• The exponential of an ai x n real or 
complex matrix A is obtained by substituting a matrix into the Taylor’s series 

(8.1) 1 + x/\\ + x 2 /2! + jc 3 /3! + … 
for e x . Thus by definition, 

(8.2) e A = / + A + —A 2 + ~A 3 + 

This is an ai x n matrix. 
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(8.3) Proposition. The series (8.2) converges absolutely for all complex matrices 儿 

In order not to break up the discussion, we have collected the proofs together at the 
end of the section. 


Since matrix multiplication is relatively complicated，it isn’t easy to write 
down the matrix entries of e A directly. In particular, the entries of e A are usually 
not obtained by exponentiating the entries of A. But one case in which they are，and 
in which the exponential is easily computed, is when A is a diagonal matrix, say with 
diagonal entries at ， Inspection of the series shows that e A is also diagonal in this 
case and that its diagonal entries are e a 、 

The exponential is also relatively easy to compute for a triangular 2x2 ma¬ 
trix. For example, let 


(8.4) 


Then 


( 8 . 5 ) 


A = 


2 


e 


A 


—1 一 

+ 

、 1— 

1 

+ - 

"1 3^ 

—• _ * zm 

e 氺 

e 2 _ 

1 

_ _ 


_ 2_ 

2 

_ 4_ 



The diagonal entries are exponentiated to obtain the diagonal entries of e A . It is a 
good exercise to calculate the missing entry * directly from the definition. 

The exponential of a matrix A can also be determined whenever we know a 
matrix P such that PAP 1 is diagonal. Using the rule PA k P~ l = (PAP~ [ ) k and the dis¬ 
tributive law for matrix multiplication, we find 


( 8 , 6 ) 


Pe A P 


-i 


PIP~ l + (PAP~ l ) + ~(PAP~ 1 ) 2 

* 


=e 層一 1 


— 

Suppose that PAP~ X = A is diagonal，with diagonal entries Then e A is also diago 
nal，and its diagonal entries are e Xi . Therefore we can compute e A explicitly; 


(8,7) 


e A = P~ L e^P. 




In order to use the matrix exponential to solve systems of differential equa¬ 
tions, we need to extend some of the properties of the ordinary exponential to it. 
The most fundamental property is e x+y = e x e y . This property can be expressed as 
a formal identity between the two infinite series which are obtained by expanding 


( 8 . 8 ) 


e 


x-\-y 


+ (x + y)/\\ + (x + yf/2\ + 


and 


e x e y = (1 + x/1! + x 2 /2\ + -*)(1 + y/W + y 2 /2l + 


We can not substitute matrices into this identity because the commutative law is 
needed to obtain equality of the two series，For instance, the quadratic terms of 
(8,8)，computed without the commutative law，are \{x 2 + xy yx + y 2 ) and 
jx 2 + xy + \y 2 . They are not equal unless xy = yx. So there is no reason to expect 
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e A+B to equal e A e B in general. However，if two matrices A and B happen to com¬ 
mute, the formal identity can be applied. 


(8.9) Proposition• 

(a) The formal expansions of (8.8), with commuting variables are equal. 

(b) Let A,B be complex nx n matrices which commute: AB = BA. Then 
e A+B = e A e B . 

The proof is at the end of the section. □ 


(8.10) Corollary• For any nX n complex matrix A, the exponential e A is invert¬ 
ible, and its inverse is e~ A . 

This follows from the proposition because A and -A commute, and hence e A e~ A = 

— A — ^ 0 — j 

e — e — /, □ 


As a sample application of Proposition (8,9b)，consider the matrix 


( 8 . 11 ) 




2 


We can compute its exponential by writing it in the form A = 2/ + fi, where 
B = 3^12. Since 2 / commutes with fi，Proposition (8.9b) applies: e A = e^e 8 , and 
from the series expansion we read off the values e 21 = e 2 l and e B = I + B. Thus 



We now come to the main result relating the matrix exponential to differential 
equations* Given an nx n matrix A, we consider the exponential e tA , t being a vari¬ 
able scalar, as a matrix-valued function: 


(8J2) 


e tA = I tA —A 2 + —A 3 + …. 


(8.13) Proposition. e tA is a differentiable function of t, and its derivative is Ae tA . 
The proof is at the end of the section. □ 


(8.14) Theorem* Let A be a real or complex nx n matrix. The columns of the 
matrix form a basis for the vector space of solutions of the differential equation 

dX 

—=AX. 


We will need the following lemma, whose proof is an exercise: 
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(8.15) Lemma. Product rule: Let A{t) and B{t) be differentiable matrix-valued 
functions of t, of suitable sizes so that their product is defined. Then the matrix 
product A(r)fi(f) is differentiable, and its derivative is 




dA 

dt 


B 


dB 

Ad〆 


□ 


Proof of Theorem (8J4). Proposition (8.13) shows that the columns of A 
solve the differential equation，because differentiation and multiplication by A act in¬ 
dependently on the columns of the matrix e tA . To show that every solution is a linear 
combination of the columns, we copy the proof given at the beginning of Section 7, 
Let X(t) be an arbitrary solution of (7.5). We differentiate the matrix product 
e~ tA X(i), obtaining 

4~(e~ tA X{t)) - -Ae~ tA X(t) + e~ tA AX(t). 
dt 

Fortunately, A and e~ tA commute. This follows directly from the definition of the ex¬ 
ponential. So the derivative is zero. Therefore, e~ tA x(i) is a constant column vector, 
say C = (ci ，…， c«)t，and X{t) = e tA C. This expresses x(t) as a linear combination of 
the columns of e tA . The expression is unique because e tA is an invertible matrix. □ 


According to Theorem (8,14)，the matrix exponential always solves the differ¬ 
ential equation (7.5), Since direct computation of the exponential can be quite 
difficult, this theorem may not be easy to apply in a concrete situation. But if A is a 
diagonalizajble matrix, then the exponential can be computed as in (8.7): 
e A = P~ l e A P. We can use this method of evaluating e tA to solve equation (7,5)，but 
of course it gives the same result as before • Thus if A is the matrix used in example 
(7.7)，so that P y A are as in (7.17)，then 


e 


tA 


e 


5 / 




e 


2t 


and 


e 


tA 


p~ l e tA P 


2 

-1 


e 


5t 


e 


it 


- 1-2 
一 1 1 


e 


e 


5t 


5t 


2e 2t 2e 5t — 2e 2t 
e 2t 2e 5t + e 2t 


The columns we have obtained form a second basis for the general solution (7.18). 


On the other hand，the matrix A 




， which represents the system of 


equations 

(8.16) 


dx 

dt 


义， 


dy 

dt 


x 
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is not diagonalizable. So the method of Section 7 can not be applied. To solve it, we 
write At = It + Bt, where B = e 2 \, and find, as in the discussion of (8,11 )， 



Thus the solutions of (8,16) are linear combinations of the columns 



To compute the exponential explicitly in all cases requires putting the matrix into 
Jordan form (see Chapter 12). 

We now go back to prove Propositions (8.3) ，（ 8,9)，and (8-13)* For want of a 
more compact notation, we will denote the j-entry of a matrix A by Ay here. So 
(AB)ij will stand for the entry of the product matrix AB, and (A k )ij for the entry of A k . 
With this notation，the / ， j-entry of is the sum of the series 


(8,19) 


(〆)( / 


I ij + A(j + — (A 2 )ij + — (A% + 


In order to prove that the series for the exponential converges, we need to 
show that the entries of the powers A* of a given matrix do not grow too fast, so that 
the absolute values of the i, y-entries form a bounded (and hence convergent) series. 
Let us define the norm of an n x n matrix A to be the maximum absolute value of the 
matrix entries: 

(8.20) ||A|| = max \Aij\. 

i，j 

In other words, ||a|| is the smallest real number such that 

(8.21) |A ( ；| < ||A || for all ij. 

This is one of several possible definitions of the norm. Its basic property is as fol¬ 
lows: 


(8.22) Lemma. Let A，5 be complex n 乂 n matrices. Then ||AS || “l|A||W ， 
and ||a*|| < rt*11^41|* for all k > 0. 

Proof. We estimate the size of the i y y-entry of AB: 

n 

|( 為 |=2 S jjj I E ^ n[|A|[||s[[. 

V V— 1 

Thus \\ab\\ < «||A||||fi||. The second inequality follows by induction from the first 
inequality. □ 

Proof of Proposition (83). To prove that the matrix exponential converges ab¬ 
solutely, we estimate the series as follows; Let a = n\\A ||, Then 
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(8,23) 


\(e A )y\ ^ 


Ui 


A i] I + ^7 I ( A2 ) 


y 


U 3 )yl + 


* _ * 


< 


A 


+ ^j w2 |l A H 3 + 


1 + (a + —a 2 + —a 3 + •")/« = l + (e a - l)/n. □ 


Proof of Proposition (8.9). 

(a) The terms of degree k in the expansions of ( 8 . 8 ) are 


(x + yf/k\ = X 




气 cYA! and 

厂 * *y ♦ 


To show that these terms are equal, we have to show that 


k, /ki =— 

r/ rlsl 


or 


k 


r 


k\ 


r! 5 ! ， 


for all k and all r, 5 such that r s = k. This is a standard formula for bino 
mial coefficients. 

(b) Denote by S n (x) the partial sum 1 + x/1! + jc 2 /2! + … + x n /n\. Then 


Sn(x)S n (y) = (1 + x/1! + x 2 /2! + … + x n /n\){\ + y/l\ + y 2 /2\ + 

= 


y n /n\) 


while 


S n (x + j) = (1 + (x + y)/\\ + (x + y) 2 /2! + … + (x + y) n /n\) 


n 


k 


n 


=1 e e 

々 =o \r 


s 


^- 0 尸 + 5 = & 


Comparing terms，we find that the expansion of the partial sum S n (x + y) 
consists of the terms in S n (x)S n (y) such that r + s ^ n. The same is true when 
we substitute commuting matrices A,B for jc ，）?，We must show that the sum of 
the remaining terms tends to zero as k - 


(8.24) Lemma* The series X ^ 


k r+s—k 


r\ s\/ v 


converges for all i 


Proof• Let a = n\\A\\ 
cording to (8.22), | (A r B s )ij 

2 E 

k r+s—k 


and b — We estimate the terms in the sum. Ac- 

< ^ a r b s . Therefore 


A r B s 


is 




< 


I ： E 

k r+s=k 


r\ s\ 


e 


a-\-b 
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The proposition follows from this lemma because, on the one hand, the i, gentry of 


(S k (A)Sk(B) - Sk(A + B))ij is bounded by X 

r-\-s>k 


r ! slJij 


According to the lemma, this sum tends to zero as k 


oo. And on the other hand, 


(S k (A)S k (B) - S k (A + B)) 


(e A e B — e A+B ) t □ 


Proof of Proposition (8J3). By definition ， 


i {etA) = 娉 


^(/+ h)A _ ^tA 

h 


Since the matrices tA and hA commute，the Proposition (8-9) shows that 

hA _ 


^ ^ tA 

h 


e 


I 


h 


e 


tA 


So our proposition follows from this lemma; 


(8.25) Lemma. lim 


e 


hA _ 


I 


h 


=A. 


Proof. The series expansion for the exponential shows that 

p hA 

(8.26) ―:- A 


h 


h 2 h 2 
2! A + 3! A 


We estimate this series: Let a = \h\n\\A\\^ Then 


h 
2! A 


h 2 




3! 


A 3 + 


u 



h 


h 2 

- 

2! (A 

+ 

31 (a 


+ 


^ ^\ h \n\\A\ 


2 


3! 


h\ 2 n 2 \\A\\ 3 + 


A 


— CL + — Cl 

2! 3! 


2 


A 


a 


(e a — l — a) 


A 


e 


a 




1 


一 1 • 


a 


Note that a - >0 as ft - >0. Since the derivative of e x is e x . 


lim 广」 

ci ― >0 Cl 


d_ 

dx 


e 


X 


e 


o 


jc = 0 


So (8,26) tends to zero with h. □ 


We will use the remarkable properties of the matrix exponential again, in 
Chapter 8. 
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I have not thought it necessary to undertake the labour 
of a formal proof of the theorem in the general case. 

Arthur Cayley 

EXERCISES 


1 . The Dimension Formula 



2 , 


1 

2 

Let T be left multiplication by the matrix ^ 

0 


2 0-1 5 — 

0 2 0 1 
1-13 2 
3-326 


Compute ker T and im T 


explicitly by exhibiting bases for these spaces, and verify (L7)* 


Determine the rank of the matrix 


11 

12 

13 

14 

21 

22 

23 

24 

31 

32 

33 

34 

41 

42 

43 

44 


3. Let T : V - > VK be a linear transformation. Prove that ker 7 is a subspace of V and that 

im 7 is a subspace of W. 

4. Let A be an m X n matrix. Prove that the space of solutions of the linear system AX = 0 
has dimension at least n - m. 


5. Let A be a x m matrix and let B be an n xp matrix. Prove that the rule 

defines a linear transformation from the space F mXn of m X n matrices to the space 

F kx P. 


6. Let (v u .^ 9 v n ) be a subset of a vector space V. Prove that the map <p: F n - ^ V 

defined by <p(x) = V\X\ + … + v n x n is a linear transformation, 

7. When the field is one of the fields F p , finite-dimensional vector spaces have finitely many 
elements. In this case，formula (L6) and formula (6.15) from Chapter 2 both apply. 
Reconcile them. 

8. Prove that every m x n matrix A of rank 1 has the form A = XY 1 , where X,Y are m- and 
n-dimensional column vectors. 

9. (a) The left shift operator S~ on V = M 00 is defined by (ai ， a 2 ，...，） …） • 

Prove that ker > 0, but im S~ = V. 

(b) The right shift operator 5 + on V = M°° is defined by (ai ， a2”" ） /vwv ^(0 ， a 1 ， a2 ，，.，）， 
Prove that ker 5 + = 0 3 but im 5 + < V. 


2. The Matrix of a Linear Transformation 

d 

1. Determine the matrix of the differentiation operator —: P n - >P n -\ with respect to the 

natural bases (see (1.4)). 

2. Find all linear transformations T: R 2 3 - > U 2 which carry the line y = x to the line 

y = 3 jc ， 

3. Prove Proposition (2*9b) using row and column operations* 
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4. Let T: W - >R 2 be the linear transformation defined by the rule jc 2j jc 3 y = 

(A + ;c 2 ,2 jc 3 — xif. What is the matrix of T with respect to the standard bases? 

5* Let Abe m nx n matrix，and let V — F n denote the space of row vectors. What is the 
matrix of the linear operator “right multiplication by A” with respect to the standard basis 
of VI 

6* Prove that different matrices define different linear transformations. 

1 • Describe left multiplication and right multiplication by the matrix (2.10), and prove that 
the rank of this matrix is r. 

8. Prove that A and A 1 have the same rank, 

9. Let T \, T 2 be linear transformations from V to W. Define T\ + T 2 and cT by the rules 
[^i + T 2 ]{v) = Ti(v) + T 2 (v) and [cT](v) = cT(v). 

(a) Prove that T\ + T 2 and cl] are linear transformations，and describe their matrices in 
terms of the matrices for T\ ， T 2 ‘ 

(b) Let L be the set of all linear transformations from V to W. Prove that these laws 
make L into a vector space, and compute its dimension. 


3. Linear Operators and Eigemectors 



Let V be the vector space of real 2x2 symmetric matrices X 
2 1 


A 


^ y 
iy z 


， and let 


.Determine the matrix of the linear operator on V defined by X^ w ^AXA t , 


with respect to a suitable basis. 

2. Let A = {atj) y B — (bij) be 2 x 2 matrices，and consider the operator T : on 

the space F 2X2 of 2x2 matrices. Find the matrix of T with respect to the basis 
(^ 11 ^ 12 ^ 21 ^ 22 ) of F 2X2 * 

3. Let T: V - > K be a linear operator on a vector space of dimension 2. Assume that T is 

not multiplication by a scalar. Prove that there is a vector v G V such that ( 1 ; ， T(t;)) is a 
basis of V y and describe the matrix of T with respect to that basis. 

4. Let r be a linear operator on a vector space V"， and let c E ： F. Let W be the set of eigen¬ 
vectors of T with eigenvalue c，together with 0. Prove that W is a 7-invariant subspace. 

5. Find all invariant subspaces of the real linear operator whose matrix is as follows. 


⑻ 


(b) 



6 . An operator on a vector space V is called nilpotent if 7^ = 0 for some k. Let r be a nil- 
potent operator, and let W l ~ im 

(a) Prove that if ^ 0 ? then dim < dim 

(b) Prove that if V is a space of dimension n and if T is nilpotent, then T n = 0 , 

7. Let T be a linear operator on U 2 . Prove that if T carries a line £ to €, then it also carries 
every line parallel to / to another line parallel to L 

8. Prove that the composition T x 0 T 2 of linear operators on a vector space is a linear opera¬ 
tor, and compute its matrix in terms of the matrices A U A 2 of T U T 2 . 

9. Let 尸 be the real vector space of polynomials p(x) = ao + a + *** + a n x n of degree 

—n 9 and let D denote the derivative - 7 -, considered as a linear operator on P. 

ax 
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(a) Find the matrix of D with respect to a convenient basis，and prove that D is a nilpo 
tent operator. 

(b) Determine all the Z)-invariant subspaces. 


10. Prove that the matrices 


a 0 

0 d 


and 


a b 

0 d 


(b ^ 0) are similar if and only if a ^ d. 


11. Let 


0 * 


氺 


a 

c 


b 

d 


be a real 2x2 matrix. Prove that A can be reduced to a matrix 


by row and column operations of the form A - >EAE^\ unless b 0 and 


12 . 


a = d. Make a careful case analysis to take care of the possibility that Z? or c is zero. 

Let r be a linear operator on U 2 3 4 with two linearly independent eigenvectors As¬ 

sume that the eigenvalues c x , c 2 of these operators are positive and that > c 2 . Let be 
the line spanned by tv. 

(a) The operator T carries every line € through the origin to another line，Using the par¬ 
allelogram law for vector addition, show that every line € ^ € 2 is shifted away from 
£ 2 toward A _ 

(b) Use (a) to prove that the only eigenvectors are multiples of v\ or v 2 . 

(c) Describe the effect on lines when there is a single line carried to itself，with positive 
eigenvalue. 


13* Consider an arbitrary 2x2 matrix A 


a b 
c d 


.The condition that a column vector X 


be an eigenvector for left multiplication by A is that F — AX be parallel to X, which means 
that the slopes ^ = x 2 /x\ and ^' = yi/y\ are equal. 

(a) Find the equation in ^ which expresses this equality. 

(b) For which A is 5 = 0 a solution? 5 = oo? 

(c) Prove that if the entries of A are positive real numbers, then there is an eigenvector in 
the first quadrant and also one in the second quadrant. 

4. The Characteristic Polynomial 

1* Compute the characteristic polynomials, eigenvalues, and eigenvectors of the following 
complex matrices* 


(a) 


-2 2 
-2 3 


(b) 


2. (a) Prove that the eigenvalues of a real symmetric 2x2 matrix are real numbers. 

(b) Prove that a real 2x2 matrix whose off-diagonal entries are positive has real 
eigenvalues. 

3. Find the complex eigenvalues and eigenvectors of the notation matrix 

cos 6 -sin 6 
sin 6 cos 6 

4. Prove that a real 3x3 matrix has at least one real eigenvalue, 

5* Determine the characteristic polynomial of the matrix 
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6, Prove Proposition (4.18). 

7, (a) Let 7 be a linear operator having two linearly independent eigenvectors with the 

same eigenvalue A, Is it true that A is a multiple root of the characteristic polynomial 

of r? 

(b) Suppose that A is a multiple root of the characteristic polynomial. Does T have two 
linearly independent eigenvectors with eigenvalue A? 

8. Let y be a vector space with basis (Vi”.. ， v n ) over a field F, and let be ele¬ 

ments of F. Define a linear operator on V by the rules T(vi) = if i < n and 

T (v n ) = C\V\ + ^2t?2 + … + dn-\V n -\ • 

(a) Determine the matrix of T with respect to the given basis, 

(b) Determine the characteristic polynomial of T. 

9. Do A and A x have the same eigenvalues? the same eigenvectors? 

10. (a) Use the characteristic polynomial to prove that a 2 x 2 real matrix P all of whose en¬ 

tries are positive has two distinct real eigenvalues. 

(b) Prove that the larger eigenvalue has an eigenvector in the first quadrant, and the 
smaller eigenvalue has an eigenvector in the second quadrant. 

11. (a) Let A be a 3 x 3 matrix, with characteristic polynomial 

p(t) = t 3 — (tr A)t 2 + S\t — (det A). 


Prove that s\ is the sum of the symmetric 2x2 subdeterminants : 





^12 

^22 


+ det 


^ii 

^31 


ai3 

a33 


+ det 


^22 

^32 


“23 
^33 _ 


*(b) Generalize to n x n matrices. 

12* Let 7be a linear operator on a space of dimension n ? with eigenvalues Ai ， …， 

(a) Prove that trT = A f + … + and that det T = A„. 

(b) Determine the other coefficients of the characteristic polynomial in terms of the 

eigenvalues. 

*13, Consider the linear operator of left multiplication of an w x ^ matrix A on the space F nXn 
of all /7 x n matrices. Compute the trace and the determinant of this operator. 

*14. Let P be a real matrix such that P l = P 2 . What are the possible eigenvalues of Pi 

15* Let A be a matrix such that A n - I. Prove that the eigenvalues of A are powers of nth root 

of unity Cn = e 27Ti/n . 


5. Orthogonal Matrices and Rotations 


1* What is the matrix of the three-dimensional rotation through the angle 6 about the axis 

e 2 l 


2. Prove that every orthonormal set of n vectors in R n is a basis. 

r 

3. Prove algebraically that a real 2x2 matrix a ^ 


is in S0 2 


c d 


represents a rotation if and only if it 


4 •⑻ Prove that O n and SO n are subgroups of GL n (R), and determine the index of SO n in 
On ‘ 

(b) Is 0 2 isomorphic to the product group S0 2 x {±/}? Is (9 3 isomorphic to SO^ x {±/}? 



Chapter 4 Exercises 


149 


5. What are the eigenvalues of the matrix A which represents the rotation of U 3 by 6 about 
an axis vl 

6. Let A be a matrix in 0 3 whose determinant is -1. Prove that —1 is an eigenvalue of A. 

7. Let A be an orthogonal 2x2 matrix whose determinant is -1. Prove that A represents a 
reflection about a line through the origin. 

8. Let A be an element of S(h, with angle of rotation 6. Show that cos 6 - {(tr A - 1)* 

9. Every real polynomial of degree 3 has a real root. Use this fact to give a less tricky proof 
of Lemma (5.23). 

*10. Find a geometric way to determine the axis of rotation for the composition of two three- 
dimensional rotations. 

11. Let t; be a vector of unit length, and let P be the plane in U 3 orthogonal to v. Describe a 
bijective correspondence between points on the unit circle in P and matrices P G SO3 
whose first column is v. 

12. Describe geometrically the action of an orthogonal matrix with determinant -1. 

13. Prove that a rigid motion, as defined by (5.15)，is bijective. 

*14, Let A be an element of SCh. Show that if it is defined, the vector 

((a 23 + a 3 2 ) _1 ， （ a 13 + (a n + a 2 \y 1 ) 1 

is an eigenvector with eigenvalue 1 * 

6. Diagonalization 


1. (a) Find the eigenvectors and eigenvalues of the matrix 



(b) Find a matrix P such that PAP' 1 is diagonal* 

「2 ll 30 

(c) Compute ^ ^ • 

- ■ 

2. Diagonalize the rotation matrix C0S ? S ^ n ^ , using complex numbers. 

sin 6 cos 6 ^ ^ 

3. Prove that if 力 ， B are n x /z matrices and if A is nonsingular，then AB is similar to BA. 

4. Let A be a complex matrix having zero as its only eigenvalue. Prove or disprove: A is 
nilpotent. 


5, In each case，if the matrix is diagonalizable, find a matrix P such that PAP [ is diagonal. 



6. Can the diagonalization (6.1) be done with a matrix P E SL n l 


7. Prove that a linear operator T is nilpotent if and only if there is a basis of V such that the 
matrix of T is upper triangular，with diagonal entries zero. 

8. Let T be a linear operator on a space of dimension 2. Assume that the characteristic poly¬ 
nomial of 7" is (? — a) 2 . Prove that there is a basis of V such that the matrix of T has one 

of the two forms 
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9. 

10 . 

11 . 


12 . 


14. 


*15. 


16. 


Let A be a nilpotent matrix. Prove that det (/ + A) = 1. 

Prove that if A is a nilpotent nx n matrix, then A n = 0. 

Find all real 2x2 matrices such that A 2 = / ， and describe geometrically the way they 
operate by left multiplication on R 2 . 

"a 0 


0 D 


.Prove that M is diago 


13 , ⑻ Let A 


be a 2 x 2 matrix with eigenvalue A, Show that (b, A - af is an ei- 


is diagonal, if A has two distinct eigenvalues 


Let M be a matrix made up of two diagonal blocks: M 

nalizable if and only if A and D are, 

a b 
c d 

I ■ 

genvector for A. 

(b) Find a matrix P such that PAP~ X 
Ai ^ A 2 . 

Let A be a complex n x n matrix. Prove that there is a matrix B arbitrarily close to A 
(meaning that | bij - atj \ can be made arbitrarily small for all i, j) such that B has n dis¬ 
tinct eigenvalues. 

Let A be a complex n 乂 n matrix with n distinct eigenvalues Ai ， _ _ _ ， . Assume that Ai is 
the largest eigenvalue, that is，that | A il > I 入 I for all/ > 1. Prove that for most vectors 
X the sequence Xk = A j ~ k A k X converges to an eigenvector Y with eigenvalue Ai ， and de¬ 
scribe precisely what the conditions on X are for this to be the case. 

(a) Use the method of the previous problem to compute the largest eigenvalue of the ma¬ 


trix 


4 


to three-place accuracy. 


(b) Compute the largest eigenvalue of the matrix 


1 2 3 

1 1 1 
1 0 1 


to three-place accuracy 


*17. Let Abe mx m and B be nx n complex matrices，and consider the linear operator T on 
the space F mXn of all complex matrices defined by T{m) = AMB. 

(a) Show how to construct an eigenvector for T out of a pair of column vectors X, Y, 
where X is an eigenvector for A and Y is an eigenvector for 

(b) Determine the eigenvalues of T in terms of those of A and B. 

*18. Let A be an n x n complex matrix, 

(a) Consider the linear operator T defined on the space F nXn of all complex nx n 
matrices by the rule T(b) = AB — BA. Prove that the rank of this operator is at most 
n 2 — n. 

(b) Determine the eigenvalues of T in terms of the eigenvalues Ai，...，of A. 

Z Systems of Differential Equations 

1. Let v be an eigenvector for the matrix A, with eigenvalue c. Prove that e ct v solves the 

dx 

differential equation = AX. 

, dx _ 

2* Solve the equation -^ = AX for the following matrices A: 


⑻ 


2 1 


"-2 2l 


1 i 


1 2 3 


0 0 1 

1 2 

(b) 

— 2 3 

(c) 

— j 1 

(d) 

0 4 5 

(e) 

1 0 0 

± ^ 


rv — 


l 1 


0 0 6 


0 1 0 
to IK 


3. Explain why diagonalization gives the general solution. 
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4. (a) Prove Proposition (7.16). 

(b) Why is it enough to write down the real and imaginary parts to get the general 
solution? 


5_ Prove Lemma (7.25). 


dx 


6 . Solve the inhomogeneous differential equation — 
the homogeneous equation ^ = ax. 


AX + B in terms of the solutions to 


7. A differential equation of the form d n x/dt n + a n - { d n ~ x x/dt n ~ x + + a x dx/dt + 

a 0 x = 0 can be rewritten as a system of first-order equations by the following trick: We 
introduce unknown functions 

义 0 ，义 1 ， " * ，义 /2 _ 1 with x = x 0? and we set dxi/dt ™ for 
i - 0，...，n - 2. The original equation can be rewritten as the system dxi/dt = 

/ = 0, …， n — 2 ， and dx n -\/dt = -(a n -ix n -\ + … + a\X\ + a^x). Determine the ma¬ 
trix which represents this system of equations. 

8. (a) Rewrite the second-order linear equation in one variable 


g + 唁 + a = 0 


as a system of two first-order equations in two unknowns x 0 — x y = dx/dt. 

(b) Solve the system when 办 =—4 and c = 3. 

9. Let A be an n x n matrix, and let B(t) be a column vector of continuous functions on the 

interval [a ， j 8 ]_ Define F(t) = I e~ lA B(t) dt. 

Ja 

(a) Prove that X = Z 7 ⑺ is a solution of the differential equation X r = AX + B(t) on the 
interval 

(b) Determine all solutions of this equation on the interval. 


8. The Matrix Exponential 

1. Compute e A for the following matrices A: 


⑻ 


2 * Let ^ 


1 


(b) 


a b 


2 


(a) Compute e A directly from the expansion. 

(b) Compute e A by diagonalizing the matrix. 

3. Compute e A for the following matrices A: 


⑻ 


0 -b 
b 0 


(b) 


0 


(c) 


0 1 


4 W 


•0 


4. Compute e A for the following matrices A: 


⑻ 


l/rri 2 tti 
2iri 


(b) 


4iri 
2iri 877 / 


5. Let Abe an nX n rftatrix. Prove that the map 
ditive group R+ to GL n {C), 


e tA is a homomorphism from the ad 
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6. Find two matrices A，S such that e A+B + e A e B , 

7. Prove the formula ^ traceA = det(e A ). 


dx 


8* Solve the differential equation = AX, when A 


2 

0 


2 


9. Let/(f) be a polynomial, and let 7 be a linear operator. Prove that/(r) is a linear 
operator. 

10* Let A be a symmetric matrix，and let f(t) be a polynomial. Prove that /(a) is symmetric. 
11. Prove the product rule for differentiation of matrix-valued functions. 

12* Let A(f )， B (f) be differentiable matrix-valued functions of t. Compute the following. 

(a) d/dt{A{tf) 

(b) d/dt{A(t)~ l ), assuming that A{t) is invertible for all t 

(c) d/dt(A(t)~ l B{t)) 


13. Let X be an eigenvector of an /2 x n matrix A ? with eigenvalue A* 

(a) Prove that if A is invertible then X is also an eigenvector for A~ x , and that its eigen¬ 
value is A -1 . 

(b) Let p{t) be a polynomial. Then X is an eigenvector for p(A), with eigenvalue p(A). 

(c) Prove that X is an eigenvector for e A , with eigenvalue e k . 

14. For an nx n matrix A, define sin A and cos A by using the Taylor’s series expansions for 
sin x and cos x. 


(a) Prove that these series converge for all A. 

(b) Prove that sin tA is a differentiable function of t and that d{sin tA)/dt = A cos tA. 


15. Discuss the range of validity of the following identities. 

(a) cos 2 A + sin 2 A = / 

(b) = cos A + i sin A 

(c) sin(A + b) = sin A cos B + cos A sin B 

(d) cos(a + b) = cos A cos S — sin A sin B 

(e) e 2mA = I 

(f) d(e A ^)/dt == e A ^A f (t), where A ⑺ is a differentiable matrix-valued function of t. 


16. (a) Derive the product rule for differentiation of complex-valued functions in two ways: 

directly, and by writing x(t) ==«(£) + iv(t) and applying the product rule for real¬ 
valued functions. 

(b) Let/(r) be a complex-valued function of a real variable r, and let <p(u) be a real¬ 
valued function of u. State and prove the chain rule for f(<p (u)). 

17. (a) Let Bk be a sequence of mx n matrices which converges to a matrix B, and let P be 

an mX m matrix. Prove that PBk converges to PB. 

(b) Prove that if m = n and P is invertible, then PBkP" 1 converges to PBP 、 

18. Letf(x) = Hckx k be a power series such that J.CkA k converges when A is a sufficiently 
small nX n matrix. Prove that A and /(a) commute. 


19* Determine ^ det A(t), when A(t) is a differentiable matrix function of t. 


Miscellaneous Problems 


1. What are the possible eigenvalues of a linear operator T such that (a) T r = I ， 

(b) r r = o, （ c) r 2 - sr + 6 - o? ’ 
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2. A linear operator T is called nilpotent if some power of T is zero. 

(a) Prove that T is nilpotent if and only if its characteristic polynomial is t n , n = dim V. 

(b) Prove that if 7 is a nilpotent operator on a vector space of dimension n, then T n = 0. 

(c) A linear operator T is called unipotent if T — l is nilpotent. Determine the character¬ 
istic polynomial of a unipotent operator. What are its possible eigenvalues? 

3. Let ^ be an n x n complex matrix. Prove that if trace V = 0 for all /， then A is nilpotent, 

*4, Let A,B be complex nx n matrices, and let c — AB — BA. Prove that if c commutes with 
A then c is nilpotent. 

5. Let Ai，，。，be the roots of the characteristic polynomial p(t) of a complex matrix A. 
Prove the formulas trace A = Ai + ”* + and det A — Ai … A„. 

6. Let T be a linear operator on a real vector space V such that T 2 = I • Define subspaces as 
follows: 

= {v E V \ T(v) — v}, W— = {v G V I T(v) == _!；}• 

Prove that V is isomorphic to the direct sum W - . 

7. The Frobenius norm | A | of an n x ^ matrix A is defined to be the length of A when it is 
considered as an n 2 -dimensional vector: |a ( 2 = S |a"| 2 . Prove the following inequali¬ 
ties; |a + s| ^ |a| + |b| and \ ab \ < |a| |s|. 

8* Let T: V - >V be a linear operator on a finite-dimensional vector space V. Prove that 

there is an integer n so that (ker 7 n ) PI (im 尸 ) = 0. 

9. Which infinite matrices represent linear operators on the space Z [Chapter 3 (5.2d)]? 

*10. The kx k minors of an m x n matrix A are the square submatrices obtained by crossing 
out w - ^ rows and n - k columns. Let A be a matrix of rank r. Prove that some rx r 
minor is invertible and that no {r + l)x (r + 1) minor is invertible. 

11. Let (p: F n - >F m be left multiplication by an m x matrix A. Prove that the following 

are equivalent. 

(a) A has a right inverse, a matrix B such that AB = /. 

(b) <p is surjective. 

(c) There is an mx m minor of A whose determinant is not zero. 

12* Let <p: F n - >F m be left multiplication by an w x n matrix A. Prove that the following 

are equivalent. 

(a) A has a left inverse，a matrix B such that BA — I. 

(b) <p is injective. 

(c) There is an w X minor of A whose determinant is not zero. 

*13. Let A be an n x « matrix such that A r = I. Prove that if A has only one eigenvalue then 
A =【/• 

14. (a) Without using the characteristic polynomial, prove that a linear operator on a vector 
space of dimension n can have at most rt different eigenvalues. 

(b) Use (a) to prove that a polynomial of degree n with coefficients in a field F has at 
most n roots in F. 

15* Let Abe an nx n matrix, and let p(t) — t n - c n -\t n ~ x + … + Ci/ + c 0 be its charac¬ 
teristic polynomial. The Cay ley-Hamilton Theorem asserts that 

p(A) = A n + c n - x A n ~ l + *** + C\A + c 0 I = 0^ 

(a) Prove the Cayley-Hamilton Theorem for 2 x 2 matrices _ 

(b) Prove it for diagonal matrices. 
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(c) Prove it for diagonalizable matrices. 

*(d) Show that every complex nXn matrix is arbitrarily dose to a diagonalizable matrix, 
and use this fact to extend the proof for diagonalizable matrices to all complex ma¬ 
trices by continuity. 

16. (a) Use the Cayley-Hamilton Theorem to give an expression for A " 1 in terms of A, 
(det A) -1 , and the coefficients of the characteristic polynomial, 

(b) Verify this expression in the 2 x 2 case by direct computation. 

*17. Let A be a 2 x 2 matrix. The Cayley—Hamilton Theorem allows all powers of A to be 
written as linear combinations of I and A. Therefore it is plausible that e A is also such a 
linear combination. 

(a) Prove that if a, b are the eigenvalues of A and if a ^ b, then 


e 


a 





b 


a — b 


A. 


(b) Find the correct formula for the case that A has two equal eigenvalues. 

18. The Fibonacci numbers 0, 1 ， 1 ， 2,3,5, 8,… are defined by the recursive relations 
f n = f n -\ + f n - 2 , with the initial conditions/ 0 = 0，/i = 1. This recursive relation can 
be written in matrix form as 


0 1 

fn-2 


'fn -： 

1 1 
■ — J 

Jn —\〜 


—fn 一 


(a) Prove the formula 

1 + a\ n /I - a\ n ' 

2 / _ V 2 ) J , 

where a = V5. 

(b) Suppose that the sequence a n is defined by the relation a n = \ {a n -\ + 知一 2 ), Com¬ 
pute \ima n in terms of a 0 ,a\, 

*19. Let A be an n x w real positive matrix，and let X E be a column vector. Let us use the 
shorthand notation x > 0 or X > 0 to mean that all entries of the vector x are positive or 
nonnegative ， respectively. By “positive quadrant” we mean the set of vectors x > 0. 
(But note that x > 0 and X 关 0 do not imply X > 0 in our sense.) 

(a) Prove that if X > 0 and X # 0 then AX > 0. 

(b) Let C denote the set of pairs (X, r), t £ R, such that x > 0, \x\ = 1, and 
(A — tl)X > 0. Prove that C is a compact set in U nJr \ 

(c) The function t takes on a maximum value on C，say at the point (x 0 , ^o). Then 
(A - tol)x 0 > 0. Prove that (A - t 0 l)x 0 = 0. 

(d) Prove that X 0 is an eigenvector with eigenvalue t 0 by showing that otherwise the vec- 
tor AX 0 = X x would contradict the maximality of t 0 . 

(e) Prove that t 0 is the eigenvalue of A with largest absolute value. 

*20. Let A = A{t) be a matrix of functions. What goes wrong when you try to prove that，in 
analogy with n = 1, the matrix 

exp^jf A{u)du 

is a solution of the system dx/dt 二 AX? Can you find conditions on the matrix function 
A(t) which will make this a solution? 
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Symmetry 


Lalgebre n f est qu’ une geometrie ecrite; 
la geometrie nest qu'une algebre figure e. 

Sophie Germain 


The study of symmetry provides one of the most appealing applications of group the¬ 
ory. Groups were first invented to analyze symmetries of certain algebraic structures 
called field extensions，and because symmetry is a common phenomenon in all sci¬ 
ences, it is still one of the two main ways in which group theory is applied. The 
other way is through group representations, which will be discussed in Chapter 9, In 
the first four sections of this chapter, we will study the symmetry of plane figures in 
terms of groups of rigid motions of the plane. Plane figures provide a rich source of 
examples and a background for the general concept of group operation, which is in¬ 
troduced in Section 5. 

When studying symmetry, we will allow ourselves to use geometric reasoning 
without bothering to carry the arguments back to the axioms of geometry- That can 
be left for another occasion. 


SYMMETRY OF PLANE FIGURES 


The possible symmetry of plane figures is usually classified into the main types 
shown in Figures (1,1-1,3). 


O 



(LI) Figure* 
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(1.2) Figure* Rotational symmetry. 


_ 



(L3) Figure. Translational symmetry. 


A fourth type of symmetry also exists, though it may be slightly less femiliar: 



X 








X 




(l A) Figure. Glide symmetry. 

Figures such as wallpaper patterns may have two independent translational 
symmetries, as shown in Figure (1.5): 



(1.5) Figure. 

Other combinations of symmetries may also occur. For instance, the star has bilat¬ 
eral as well as rotational symmetry. Figure (1.6) is an example in which translational 
and rotational symmetry are combined: 

* • * 

(1.6) Figure. 

Another example is shown in Figure (1.7). 




Section 2 The Group of Motions of the Plane 


157 


As in Section 5 of Chapter 4 ? we call a map m: P - > P from the plane P to 

itself a rigid motion, or an isometry, if it is distance-preserving, that is, if for any 
two points p, q E 尸 the distance from p to q h equal to the distance from m(p) to 
m(q). We will show in the next section that the rigid motions are translations, rota¬ 
tions, reflections, and glide reflections. They form a group M whose law of composi¬ 
tion is composition of functions. 

If a rigid motion m carries a subset F of the plane to itself，we call it a symme¬ 
try of F. The set of all symmetries of F always forms a subgroup G of M ， called the 
group of symmetries of the figure. The fact that G is a subgroup is clear: If m and m f 
carry F to F, then so does the composed map mm’，and so on. 

The group of symmetries of the bilaterally symmetric Figure (1.1) consists of 
two elements: the identity transformation 1 and the reflection r about a line called 
the axis of symmetry. We have the relation at 二 1， which shows that G is a cyclic 
group of order 2, as it must be，because there is no other group of order 2, 

The group of symmetries of Figure (1.3) is an infinite cyclic group generated 
by the motion which carries it one unit to the left* We call such a motion a transla¬ 
tion t: 


G = 



t 


_2 JL— 1 




The symmetry groups of Figures (1.4, L6,1.7) contain elements besides translations 
and are therefore larger. Do the exercise of describing their elements. 


2. THE GROUP OF MOTIONS OF THE PLANE 

This section describes the group M of all rigid motions of the plane. The coarsest 
classification of motions is into the orientation-preserving motions, those which do 
not flip the plane over, and the orientation-reversing motions which do flip it over 
(see Chapter 4, Section 5). We can use this partition of M to define a map 

M 一^ >{± 1 } ? 

by sending the orientation-preserving motions to 1 and the orientation-reversing 
motions to -L You will convince yourself without difficulty that this map is a ho¬ 
momorphism: The product of two orientation-reversing motions is orientation- 
preserving ? and so on. 

A finer classification of the motions is as follows: 

( 2 . 1 ) 

(a) The orientation-preserving motions: 

(i) Translation: parallel motion of the plane by a vector a: p^^p+a. 

(ii) Rotation: rotates the plane by an angle 8^0 about some point* 

(b) The orientation-reversing motions: 

(i) Reflection about a line 广 

(ii) Glide reflection; obtained by reflecting about a line €, and then translating 
by a nonzero vector a parallel to €. 
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(2.2) Theorem. The above list is complete. Every rigid motion is a translation, a 
rotation, a reflection, a glide reflection, or the identity. 

This theorem is remarkable. One consequence is that the composition of rotations 
about two different points is a rotation about a third point, unless it is a translation. 
This fact follows from the theorem, because the composition preserves orientation ， 
but it is not obvious. 

Some of the other compositions are easier to visualize. The composition of ro¬ 
tations through angles 6 and rj about the same point is again a rotation, through the 
angle d + i], about that point. The composition of translations by the vectors a and 
b is the translation by their sum a + b. 

Note that a translation does not leave any point fixed (unless the vector a is 
zero, in which case it is the identity map). Glides do not have fixed points either. On 
the other hand，a rotation fixes exactly one point，the center of rotation, and a 
reflection fixes the points on the line of reflection. Hence the composition of 
reflections about two nonparallel lines ， €2 is a rotation about the intersection point 
p = €1 H € 2 * This follows from the theorem，because the composition does fix p, 
and it is orientation-preserving. The composition of two reflections about parallel 
lines is a translation by a vector orthogonal to the lines. 

In order to prove Theorem (2.2)，and also to be able to compute conveniently 
in the group M，we are going to choose some special motions as generators for the 
group. We will obtain defining relations similar to the relations (1,18) in Chapter 2 
which define the symmetric group S 3 , but since M is infinite，there will be more of 
them. 

Let us identify the plane with the space U 2 of column vectors，by choosing a 
coordinate system. Having done this，we choose as generators the translations, the 
rotations about the origin, and the reflection about the x r axis: 

(2.3) 

(a) Translation t a by a vector a: t a (x) = x + a = 

(b) Rotation pe by an angle 6 about the origin: 


x 2 


(c) Reflection r about the xi-axis: r(x )= 

Since they fix the origin, the rotations pe and the reflection r are orthogonal opera¬ 
tors on U 2 . A translation is not a linear operator — it does not send zero to itself, ex¬ 
cept of course for translation by the zero vector. 

The motions (2.3) are not all of the elements of M. For example, rotation 
about a point other than the origin is not listed，nor are reflections about other lines. 


1 0 


X\ 


X\ 

0-1 

_ _ 


x 2 


~X2 


Pe(x) 


cos 6 -sin 6 
sin 0 cos 6 


X\ + fli 
X2 + di 
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However, they do generate the group: Every element of M is a product of such ele¬ 
ments, It is easily seen that any rigid motion m can be obtained by composing them. 
Either 

(2.4) m — t a pe or else m = t a per ， 

for some vector a and angle 0， possibly zero. To see this, we recall that every rigid 
motion is the composition of an orthogonal operator followed by a translation 
[Chapter 4 (5.20)]. So we can write m in the form m = t a m , , where m' is an or¬ 
thogonal operator. Next, if det m' = 1, then it is one of the rotations p$. This fol¬ 
lows from Theorem (5.5) of Chapter 4. So in this case, m = t a p6> Finally, if 
det m f = 一 1 ， then det m’r = 1, so mV is a rotation pe • Since r 2 = 1 ， m' = p e r in 
this case，and m = taper. 

The expression of a motion m as a product (2.4) is unique. For suppose that m 
is expressed in two ways: m = t a per l = tbp v r j , where /, j are 0 or L Since m is 
orientation-preserving if / = 0 and orientation-reversing if / = 1， we must have 
i = j\ and so we can cancel r from both sides if necessary, to obtain the equality 
t a pe = tbprj^ Multiplying both sides on the left by “ and on the right by p — e ， we 
find t a -b = Prj-e* But a translation is not a rotation unless both are the trivial opera¬ 
tions. So a = ^ and 6 = r]. u 

Computation in M can be done with the symbols t a ， pe，r using rules for com¬ 
posing them which can be calculated from the formulas (2.3). The necessary rules 
are as follows: 

tatb — t a +b ， pepT] — P 0 +rj ， rr — 1 9 
pet a = t a ， pe, where a f = pe{a), 
rt a — t a ， r ， where a f — r(a) 9 
rpe = p-er. 

Using these rules，we can reduce any product of our generators to one of the two 
forms (2.4)* The form we get is uniquely determined, because there is only one ex¬ 
pression of the form (2.4) for a given motion. 

Proof of Theorem (2.2). Let m be a rigid motion which preserves orientation but is 
not a translation. We want to prove that m is a rotation about some point* It is clear 
that an orientation-preserving motion which fixes a point p in the plane must be a ro¬ 
tation about p. So we must show that every orientation-preserving motion m which 
is not a translation fixes some point. We write m = tape as in (2.4). By assumption, 
6 ^0. One can use the geometric picture in Figure (2.6) to find the fixed point. In 
it, £ is the line through the origin and perpendicular to a, and the sector with angle 
6 is situated so as to be bisected by €• The point p is determined by inserting the 
vector a into the sector, as shown. To check that m fixes p, remember that the oper¬ 
ation pe is the one which is made first，and is followed by t a . 
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(2.6) Figure. The fixed point of an orientation-preserving motion* 

Another way to find the fixed point is by solving the equation x = t a pe{x) 
algebraically for x. By definition of a translation ， t a (pe(x)) = pe(x) + a. So the 
equation we need to solve is 

x - peW = a or 

(2.7) 1-cos 6 sin 6 xi 

-sin 6 1—cos 8 x 2 

. — ■— 

Note that det(l — pe) = 2 — 2 cos 6 . The determinant is not zero if 0 关0 ， so 
there is a unique solution for x. 



(2.8) Corollary. The motion m = t a pe is the rotation through the angle 6 about 
its fixed point, 

Proof • As we just saw, the fixed point of m is the one which satisfies the rela¬ 
tion p — pe(p) + ci. Then for any x, 

m(p + x) = t a pd(p + x) = pe(p + x) -\- a = pe(p) + p e (x) + a = p + pe(x). 

Thus m sends p x to p pe(x). So it is the rotation about p through the angle 0 ， 
as required. □ 


Next，we will show that any orientation-reversing motion m = taper is a glide 
reflection or a reflection (which we may consider to be a glide reflection having glide 
vector zero). We do this by finding a line € which is sent to itself by m, and so that 
the motion of m on € is a translation. It is clear geometrically that an orientation- 
reversing motion which acts in this way on a line is a glide reflection. 

The geometry is more complicated here, so we will reduce the problem in two 
steps. First，the motion per = 〆 is a reflection about a line. The line is the one 
which intersects the x r axis at an angle of 士 0 at the origin. This is not hard to see ， 
geometrically or algebraically. So our motion m is the product of the translation t a 
and the reflection r 1 • We may as well rotate coordinates so that the jci-axis becomes 
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the line of reflection of rThen r f becomes our standard reflection r，and the trans¬ 
lation t a remains a translation, though the coordinates of the vector a will have 
changed. In this new coordinate system，the motion is written as m = t a r y and it 
acts as 


m 


X\ 


X\ 

+ a\ 

JC 2 - 



+ a 2 _ 


This motion sends the line x 2 = ja 2 to itself，by the translation (xuja2) t/wvs ^ 
(jci + ai^a 2 )\ and so m is a glide along this line. □ 


There are two important subgroups of M for which we must introduce 
notation: 

(2.9) 

T ，the group of translations. 

O ， the group of orthogonal operators. 

The group O consists of the motions leaving the origin fixed. It contains the rotations 
about the origin and reflections about lines through the origin. 

Notice that with our choice of coordinates we get a bijective correspondence 


U 2 — >T 

( 2 . 10 ) 


This is an isomorphism of the additive group (U 2 )^ with the subgroup 7 1 , because 

tatb = t a -hb- 

The elements of O are linear operators. Again making use of our choice of co¬ 
ordinates ? we can associate an element m G O to its matrix. Doing so, we obtain an 
isomorphism 


from the group 0 2 of orthogonal 2x2 matrices to O [see Chapter 4 (5.16)]. 

We can also consider the subgroup of M of motions fixing a point of the plane 
other than the origin. This subgroup is related to O as follows: 


(2.11) Proposition. 

(a) Let /? be a point of the plane. Let p$ r denote rotation through the angle 6 about 
p ，and let r f denote reflection about the line through p and parallel to the 
x-axis. Then pe = t p pet p ~ l and r f = t p rt p ~\ 

(b) The subgroup of M of motions fixing p is the conjugate subgroup 

O' = t p Ot p ~ l . 
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Proof. We can obtain the rotation pe in this way: First translate p to the 
origin, next rotate the plane about the origin through the angle 0， and finally trans¬ 
late the origin back to p: 

pe r = tppet-p = tppetp 1 , 

The reflection r ! can be obtained in the same way from r\ 



t p rt- 


p 


t p rt p 


—i 


This proves (a). Since every motion fixing p has the form pe or pe r f [see the proof 
of (2.4)] ，（ b) follows from (a). □ 


There is an important homomorphism <p from M to O whose kernel is T y which 
is obtained by dropping the translation from the products (2.4): 

( 2 . 12 ) tape^^^pe 

taper per. 

This may look too naive to be a good definition，but formulas (2,5) show that is a 
homomorphism: (tape)(tbpr)) = tat^pepj) = t a +iyp0 + v ， hence (p(t a petbp v ) = pe+ v , 
and so on. Since T is the kernel of a homomorphism, it is a normal subgroup of M. 
Note that we can not define a homomorphism from M to T in this way, 

(2.13) Proposition* Let p be any point of the plane, and let pe denote rotation 
through the angle 6 about p. Then (pipe) = pe. Similarly, if 〆 is reflection about 
the line through p and parallel to the x-axis，then (p{r f ) = r. 

This follows from (2.11a)，because t p is in the kernel of (p. The proposition can 
also be expressed as follows: 

(2.14) The homomorphism cp does not depend on the choice of origin. □ 


3. FINITE GROUPS OF MOTIONS 


In this section we investigate the possible finite groups of symmetry of figures such 
as (1.1) and (1.2). So we are led to the study of finite subgroups G of the group M 
of rigid motions of the plane. 

The key observation which allows us to describe all finite subgroups is the fol¬ 
lowing theorem. 

(3.1) Theorem• Fixed Point Theorem: Let G be a finite subgroup of the group of 
motions M, There is a point p in the plane which is left fixed by every element of G, 
that is，there is a point p such that g(p) = p for all g GG. 
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It follows，for example, that any subgroup of M which contains rotations about 
two different points is infinite. 

Here is a beautiful geometric proof of the theorem. Let 5 be any point in the 
plane, and let 5 be the set of points which are the images of s under the various mo¬ 
tions in G. So each element s f E S has the form s f ^ g(s) for some ^ G G, This 
set is called the orbit of s under the action of G. The element s is in the orbit because 
the identity element 1 is in G, and s = l(^). A typical orbit is depicted below, for 
the case that G is the group of symmetries of a regular pentagon. 



• P 



s 


Any element of the group G will permute the orbit 5. In other words，if 
s f E S and x S G, then x(s f ) E ： S. For, say that s f = g(s), with g B G. Since G 
is a group, xg E G. Therefore, by definition ， xg(s) E 5. Since xg(s) = jc( 〆 ）， this 
shows that x(s f ) E ： S. 

We list the elements of S arbitrarily, writing S = {&，•■•，〜}• The fixed point 
we are looking for is the center of gravity of the orbit, defined as 

(3.2) P = AOi + … + 〜)， 

where the right side is computed by vector addition, using an arbitrary coordinate 
system in the plane. The center of gravity should be considered an average of the 

points 心， … ，知 . 

(3.3) Lemma. Let S = { 心 ， …， 〜} be a finite set of points of the plane，and let p 

be its center of gravity，defined by (3.2). Let m be a rigid motion, and let 
m{si) = Si f and m(p) = p f . Then ；?' = + ••• + 〜’)• In other words，rigid 

motions carry centers of gravity to centers of gravity. 

Proof• This is clear by physical reasoning. It can also be shown by calcula¬ 
tion. To do so, it suffices to treat separately the cases m = t a ，m = pe ，and m 二 r ， 
since any motion is obtained from these by composition. 

Case 1: m = t a . Then p f = p + a and s/ = Si + a. It is true that 

p + a = *((Si + a) + + (s n + a)). 

Case 2: m = p e or r. Then m is a linear operator. Therefore 

p f = m^isi + + s n )) = i(m(5i) + + m(s n )) = iCV + _•• + s n f ). □ 
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The center of gravity of our set 5 is a fixed point for the action of G. For, any 
element gi of G permutes the orbit { 心 ， …，〜 }，so Lemma (3.3) shows that it sends 
the center of gravity to itself. This completes the proof of the theorem. □ 

Now let G be a finite subgroup of M. Theorem (3.1) tells us that there is a 
point fixed by every element of G, and we may adjust coordinates so that this point 
is the origin. Then G will be a subgroup of O. So to describe the finite subgroups G 
of M，we need only describe the finite subgroups of O (or, since O is isomorphic to 
the group of orthogonal 2x2 matrices, the finite subgroups of the orthogonal group 
0 2 ). These subgroups are described in the following theorem, 

(3.4) Theorem. Let G be a finite subgroup of the group O of rigid motions which 
fix the origin. Then G is one of the following groups: 

(a) G = C n : the cyclic group of order n, generated by the rotation p$, where 

0 = lir/n. 

(b) G = D n \ the dihedral group of order 2«，generated by two elements — the ro¬ 
tation pe, where 6 = Itt/h, and a reflection r r about a line through the origin. 

The proof of this theorem is at the end of the section. 

The group D n depends on the line of reflection, but of course we may choose 
coordinates so that it becomes the x-axis，and then r f becomes our standard 
reflection r. If G were given as a finite subgroup of M ，we would also need to shift 
the origin to the fixed point in order to apply Theorem (3.4). So our end result about 
finite groups of motions is the following corollary: 

(3.5) Corollary. Let G be a finite subgroup of the group of motions M. If coordi¬ 
nates are introduced suitably, then G becomes one of the groups C n or D n ，where C" 
is generated by pe, 0 = Itt/h , and D n is generated by pe and r. □ 

When « > 3 3 the dihedral group D n is the group of symmetries of a regular 
Ai-sided polygon. This is easy to see, and in fact it follows from the theorem. For a 
regular «-gon has a group of symmetries which contains the rotation by 2tt /n about 
its center. It also contains some reflections. Theorem (3.4) tells us that it is D n . 

The dihedral groups D\,D 2 are too small to be symmetry groups of an «-gon in 
the usual sense. D } is the group {1 ， r} of two elements. So it is a cyclic group，as is 
C 2 . But the nontrivial element of D\ is a reflection, while in C 2 it is rotation through 
the angle 7r. The group D 2 contains the four elements {l ， p ， r ， pr} ， where p = 

It is isomorphic to the Klein four group* If we like，we can think of D\ and D 2 as 
groups of symmetry of the 1-gon and 2-gon: 
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The dihedral groups are important examples, and it will be useful to have a 
complete set of defining relations for them. They can be read off from the list of 
defining relations for M (2.5). Let us denote the rotation pe (6 = Itt/u) by and 
the reflection r by y. 

(3.6) Proposition* The dihedral group D n is generated by two elements x, y which 
satisfy the relations 

jc^ = 1 ? j 2 = 1 , yx = 

The elements of D n are 

; y,xy,x 2 y^.^x n ~ x y) = {x i y i | 0 < / < n, 0 ^ j < 2}. 

Proof. The elements x — pe and y = r generate D n by definition of the group. 

The relations y 2 = 1 and yx = x~ x y are included in the list of relations (2.5) for M: 

They are rr = 1 and rpe = p-er. The relation jc" = 1 follows from the fact that 

6 = lir/n , which also shows that the elements 1 ， ； c，，，.，are distinct. It follows 

that the elements y, xy,x 2 y,...,x n ~ l y are also distinct and，since they are reflections 

while the powers of x are rotations, that there is no repetition in the list of elements. 

Finally, the relations can be used to reduce any product of x,y, x~\y~ l to the form 
■ ♦ 

x l y J 5 with 0 ^ 0 < j < 2. Therefore the list contains all elements of the 

group generated by xj ? and since these elements generate D n the list is complete. □ 

Using the first two relations (3.6)，the third relation can be written in various 
ways. It is equivalent to 

(3.7) yx = x n ~ x y and also to xyxy — L 

Note that when n = 3， the relations are the same as for the symmetric group & 
[Chapter 2(1.18)]. 

(3.8) Corollary* The dihedral group Z) 3 and the symmetric group 53 are isomor¬ 
phic. □ 

For n > 3, the dihedral and symmetric groups are certainly not isomorphic, because 
D n has order 2n, while S n has order nL 

Proof of Theorem (3.4). Let G be a finite subgroup of O. We need to remen> 
ber that the elements of O are the rotations pe and the reflections per. 

Case 1 : All elements of G are rotations. We must prove that G is cyclic in this case. 
The proof is similar to the determination of the subgroups of the additive group Z+ 
of integers [Chapter 2 (2.3)]. If G = {1} ? then G = Cu Otherwise G contains a 
nontrivial rotation pe. Let 9 be the smallest positive angle of rotation among the ele_ 
ments of G. Then G is generated by pe. For let p a be any element of G, where the 
angle of rotation a is represented as usual by a real number. Let nd be the greatest 
integer multiple of 6 which is less than a ， so that a — nd + (3, with 0 < )8 < 6. 
Since G is a group and since p a and pe are in G, the product pjs = pap-ne is also in 
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G. But by assumption 6 is the smallest positive angle of rotation in G. Therefore 
/3 = 0 and a = n6. This shows that G is cyclic. Let nB be the smallest multiple of 6 
which is >277, so that 2tt ^ nB < 2 tt 6. Since 6 is the smallest positive angle of 
rotation in G, nB = 2 tt. Thus 6 = 2ir/n for some integer n. 

Case 2: G contains a reflection. Adjusting coordinates as necessary, we may assume 
that our standard reflection r is in G. Let H denote the subgroup of rotations in G. 
We can apply what has been proved in Case 1 to the group H, to conclude that it is a 
cyclic group: H — C n . Then the 2n products pe\ per, 0 < i < n — are in G, 
and so G contains the dihedral group D n . We must show that G = D n . Now if an 
element ^ of G is a rotation, then g G H by definition of H\ hence g is one of the 
elements of D n . If ^ is a reflection, we can write it in the form p a r for some rotation 
p a (2-8)- Since r is in G, so is the product p a rr = p a . Therefore p a is a power of 
pe, and g is in D n too. So G = D n . This completes the proof of the theorem* □ 

4. DISCRETE GROUPS OF MOTIONS 

In this section we will discuss the symmetry groups of unbounded figures such as 
wallpaper patterns. Our first task is to describe a substitute for the condition that the 
group is finite — one which includes the groups of symmetry of interesting un¬ 
bounded figures. Now one property which the patterns illustrated in the text have is 
that they do not admit arbitrarily small translations or rotations * Very special figures 
such as a line have arbitrarily small translational symmetries, and a circle, for exam¬ 
ple, has arbitrarily small rotational symmetries. It turns out that if such figures are 
ruled out, then the groups of symmetry can be classified. 

(4.1) Definition. A subgroup G of the group of motions M is called discrete if it 
does not contain arbitrarily small translations or rotations. More precisely, G is dis¬ 
crete if there is some real number e > 0 so that 

(i) if t a is a translation in G by a nonzero vector a, then the length of a is at least 
e: \a\ > e; 

(ii) if p is a rotation in G about some point through a nonzero angle 6, then the 
angle 6 is at least €： \6\ > e. 

Since the translations and rotations are all the orientation-preserving motions (2.1), 
this condition applies to all orientation-preserving elements of G. We do not impose 
a condition on the reflections and glides. The one we might ask for follows automat¬ 
ically from the condition imposed on orientation-preserving motions. 

The kaleidoscope principle can be used to show that every discrete group of 
motions is the group of symmetries of a plane figure. We are not going to give pre¬ 
cise reasoning to show this，but the method can be made into a proof. Start with a 
sufficiently random figure R in the plane. We require in particular that R shall not 
have any symmetries except for the identity. So every element g of our group will 
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move /? to a different position, call it gR. The required figure F is the union of all 
the figures gR. An element x of G sends gR to xgR ， which is also a part of F，and 
hence it sends F to itself. If R is sufficiently random, G will be its group of sym¬ 
metries. As we know from the kaleidoscope, the figure F is often very attractive. 
Here is the result of applying this procedure in the case that G is the dihedral group 
of symmetries of a regular pentagon: 




Of course many figures have the same group or have similar groups of symme¬ 
try. But nevertheless it is interesting and instructive to classify figures according to 
their groups of symmetry. We are going to discuss a rough classification of the 
groups，which will be refined in the exercises. 

The two main tools for studying a discrete group G are its translation group 
and its point group. The translation group of G is the set of vectors a such that 
t a E G. Since t a tb = t a +b and t^ a = t a ~\ this is a subgroup of the additive group of 
vectors，which we will denote by Lg. Using our choice of coordinates，we identify 
the space of vectors with U 2 . Then 

(4.2) L c = {a E [R 2 I E G}. 

This group is isomorphic to the subgroup T D G of translations in G, by the isomor¬ 
phism (2.10); Since it is a subgroup of G, T fl G is discrete: A subgroup 

of a discrete group is discrete. If we translate this condition over to Lg, we find 

(4.3) Lg contains no vector of length < e，except for the zero vector • 

A subgroup L of (R rt+ which satisfies condition (4.3) for some e > 0 is called a 
discrete subgroup of Here the adjective discrete means that the elements of L 
are separated by a fixed distance: 

(4.4) The distance between any two vectors a，b G L is at least e ， if a 丰 b. 

For the distance is the length of b — a, and b — a G L because L is a subgroup, 

(4.5) Proposition. Every discrete subgroup L of 1R 2 has one of these forms: 

(a) L = {0}. 
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(b) L is generated as an additive group by one nonzero vector a: 

L = {ma I m E Z}. 

(c) L is generated by two linearly independent vectors a, b : 

L = {ma + nb | m,n E Z}. 

Groups of the third type are called plane lattices, and the generating set {a,b) is 
called a lattice basis • 



(4.6) Figure • A lattice in U 2 . 


We defer the proof of Proposition (4.5) and turn to the second tool for studying 
a discrete group of motions — its point group. Recall that there is a homomorphism 

(2.13) (p: M ->0, whose kernel is T. If we restrict this homomorphism to G, we 

obtain a homomorphism 

(4.7) <p \ G ： G — ^O. 

Its kernel is T D G (which is a subgroup isomorphic to the translation group Lg), 
The point group G is the image of G in O. Thus G is a subgroup of O. 

By definition, a rotation pe is in G if G contains some element of the form 
tape- And we have seen (2.8) that t a pe is a rotation through the angle 8 about some 
point in the plane. So the inverse image of an element p$ S G consists of all of the 
elements of G which are rotations through the angle 6 about some point. 

Similarly, let £ denote the line of reflection of per. As we have noted before, 
its angle with the x-axis is The point group G contains per if there is some ele¬ 
ment taper in G ? and t a per is a reflection or a glide reflection along a line parallel to 
So the inverse image of per consists of all elements of G which are reflections and 
glides along lines parallel to __ 

_ Since G contains no small rotations，the same is true of its point group G. So 
G is discrete too — it is a discrete subgroup of O. 
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(4.8) Proposition. A discrete subgroup of O is a finite group. 

We leave the proof of this proposition as an exercise. □ 

Combining Proposition (4.8) with Theorem (3.4), we find the following: 

(4.9) Corollary, The point group G of a discrete group G is cyclic or dihedral. □ 

Here is the key observation which relates the point group to the translation 
group: 

(4.10) Proposition* Let^G be a discrete subgroup of M, with translation group 
L = Lg and point group G. The elements of G carry the group L to itself. In other 
words, ifJ^G and a G L ， then 瓦 (a) E L. 

We may restate this proposition by saying that G is contained in the group of 
symmetries of L , when L is regarded as a set of points in the plane U 2 . However, it 
is important to note that the original group G need not operate on L . 

Proof. To say that a G L means that t a E G. So we have to show that if 
t a E ： G and E G ? then tg( a ) E G. Now by definition of the point group, is the 
image of some element g of the group G: <p(g) = We will prove the proposition 
by showing that tg{ a ) is the conjugate of ^ by We write g = tbp or hpr, where 
p = Then ^ = p or pr, according to the case. In the first case, 

gtag 1 = tbptap l t-b = tbtp{ a )pp l t-b = tp(a) ? 

as required* The computation is similar in the other case* □ 

The following proposition describes the point groups which can arise when the 
translation group Lg is a lattice. 

(4.11) Proposition. Let // C O be a finite subgroup of the group of symmetries of a 
lattice L . Then 

(a) Every rotation in H has order 1 ， 2, 3, 4, or 6. 

(b) H is one of the groups C n ? D n where n = 1 ， 2, 3, 4, or 6. 

This proposition is often referred to as the Crystallographic Restriction. Notice that a 
rotation of order 5 is ruled out by (4.11). There is no wallpaper pattern with fivefold 
rotational symmetry. (However, there do exist “quasi-periodic” patterns with 
fivefold symmetry.) 

To prove Propositions (4.5) and (4.11), we begin by noting the following sim¬ 
ple lemma: 

(4.12) Lemma. Let L be a discrete subgroup of U 2 . 

(a) A bounded subset S of [R 2 contains only finitely many elements of L. 

(b) If L =5^ {0}，then L contains a nonzero vector of minimal length. 
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Proof • 

(a) Recall that a subset S of U n is called bounded if it is contained in some large 
box, or if the points of S do not have arbitrarily large coordinates* Obviously, if S 
is bounded, so is L D S. Now a bounded set which is infinite must contain some 
elements arbitrarily close to each other 一 that is, the elements can not be separated 
by a fixed positive distance e. This is not the case for L, by (4.4). Thus L D 5 is 
finite. 

(b) When we say that a nonzero vector a has minimal length，we mean that every 
nonzero vector v G L has length at least | a\. We don’t require the vector a to be 
uniquely determined. In fact we couldn’t require this, because whenever a has min¬ 
imal length, -a does too. 

Assume that L 关 {0}. To prove that a vector of minimal length exists, we let 
E L be any nonzero vector，and let S be the disc of radius | b \ about the origin. 
This disc is a bounded set, so it contains finitely many elements of L, including b. 
We search through the nonzero vectors in this finite set to find one having minimal 
length. It will be the required shortest vector. □ 

Proof of Proposition (4A1). The second part of the proposition follows from 
the first，by (3.6). To prove (a), let 6 be the smallest nonzero angle of rotation in H ， 
and let a be a nonzero vector in L of minimal length. Then since H operates on L , 
Pd{a) is also in L\ hence b = pd(a) — a E ： L. Since a has a minimal length, 

IZ? I > I <31* It follows that 6 > 2tt/6. 



Thus pe has order < 6. The case that 6 = 2tt/5 is also ruled out，because then the 
element b f = pe 2 (a) + a is shorter than a: 



This completes the proof. □ 

Proof of Proposition (4.5). Let L be a discrete subgroup of U 2 . The possibility 
that L = {0} is included in the list. If L + {0}，there is a nonzero vector a E ： L ，and 
we have two possibilities: 
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Case 1 : All vectors in L lie on one line £ through the origin. We repeat an argument 
used several times before，choosing a nonzero vector a E ： L of minimal length. We 
claim that L is generated by a. Let v be any element of L. Then it is a real multiple 
v = ra of a, since L C £• Take out the integer part of r, writing r = n + r 0 , where 
n is an integer and 0 < r 0 < 1. Then v - na = r 0 a has length less than a, and 
since L is a group this element is in Therefore r 0 = 0. This shows that v is an in¬ 
teger multiple of a, and hence that it is in the subgroup generated by a, as required. 

Case 2: The elements of L do not lie on a line. Then L contains two linearly inde¬ 
pendent vectors a\b\ We start with an arbitrary pair of independent vectors, and 
we try to replace them by vectors which will generate the group L. To begin with ， 
we replace a' by a shortest nonzero vector a on the line € which a f spans. When 
this is done，the discussion of Case 1 shows that the subgroup £ D Lis generated by 
a. Next，consider the parallelogram P f whose vertices are 0,a,b\a + b f : 



(4.13) Figure* 

Since 尸 'is a bounded set, it contains only finitely many elements of L (4.12). We 
may search through this finite set and choose a vector b whose distance to the line € 
is as small as possible, but positive. We replace b f by this vector. Let P be the paral¬ 
lelogram with 0, a, b y a + b. We note that P contains no points of L except for its 
vertices. To see this, notice first that any lattice point c in P which is not a vertex 
must lie on one of the line segments [b, a + b] or [0, a]. Otherwise the two points c 
and c — a would be closer to € than b, and one of these points would lie in P f . Next, 

the line segment [0 ， a] is ruled out by the fact that a is a shortest vector on €. Fi¬ 
nally, if there were a point c on [b, a + b], then c — b would be an element of L on 
the segment [0,a]. The proof is completed by the following lemma. 

(4,14) Lemma. Let a, b be linearly independent vectors which are elements of a 
subgroup L of U 2 . Suppose that the parallelogram P which they span contains no ele¬ 
ment of L other than the vertices 0, a ， b,a + b. Then L is generated by a and b, that 
is ， 

L = {ma + I m，n E Z}. 

Proof. Let v be an arbitrary element of L . Then since (a,b) is a basis of U 2 , v 
is a linear combination，say v - ra + sb^ where r,s are real numbers* We take out 
the integer parts of r, s, writing r = m ro, s = n So, where m, n are integers 
and 0 < ro. So < L Let v 0 = r 0 a + s Q b = v — ma — nb. Then Vo lies in the paral- 
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lelogram P, and Vo ^ L, Hence Vo is one of the vertices，and since ro ， 恥 < 1， it 
must be the origin. Thus v = ma + nb m This completes the proof of the lemma and 
of Proposition (4.5). □ 

Let L be a lattice in R 2 . An element t? E L is called primitive if it is not 
an integer multiple of another vector in L, The preceding proof actually shows the 
following : 

(4.15) Corollary, Let L be a lattice, and let d be a primitive element of L . There 
is an element w G L so that the set (u ， vv) is a lattice basis- □ 

Now let us go back to our discrete group of motions G C M and consider the 
rough classification of G according to the structure of its translation group Lg- If Lg 
is the trivial group, then the homomorphism from G to its point group is bijective 
and G is finite. We examined this case in Section 3. 

The discrete groups G such that Lg is infinite cyclic are the symmetry groups 
of frieze patterns such as (1.3). The classification of these groups is left as an 
exercise. 

If Lg is a lattice, then G is called a two-dimensional crystallographic group, or 
a lattice group • These groups are the groups of symmetries of wallpaper patterns and 
of two-dimensional crystals. 

The fact that any wallpaper pattern repeats itself in two different directions is 
reflected in the fact that its group of symmetries will always contain two independent 
translations, which shows that Lg is a lattice. It may also contain further elements — 
rotations ， reflections, or glides 一 but the crystallographic restriction limits the possi¬ 
bilities and allows one to classify crystallographic groups into 17 types- The clas¬ 
sification takes into account not only the intrinsic structure of the group，but also the 
type of motion that each group element represents. Representative patterns with the 
various types of symmetry are illustrated in Figure (4.16). 

Proposition (4.11) is useful for determining the point group of a crystallo¬ 
graphic group. For example，the brick pattern shown below has a rotational symme¬ 
try through the angle it about the centers of the bricks. All of these rotations repre¬ 
sent the same element of the point group G. The pattern also has glide symmetry 
along the dotted line indicated. Therefore the point group G contains a reflection. 
By Proposition (4.11)，G is a dihedral group. On the other hand, it is easy to see that 
the only nontrivial rotations in the group G of symmetries are through the angle tt. 
Therefore G = D 2 = r.p^r}. 
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The point group G and the translation group Lc do not completely characterize 
the group G. Things are complicated by the fact that a reflection in G need not be 
the image of a reflection in G — it may be represented in G only by glides, as in the 
brick pattern illustrated above. 

As a sample of the methods required to classify the two-dimensional crystallo¬ 
graphic groups, we will describe those whose point group contains a rotation p 
through the angle 7 t/2 . According to Proposition (4.11), the point group will be ei¬ 
ther C 4 or D 4 . Since any element of G which represents p is also a rotation through 
7 t /2 about some point p, we may choose p to be the origin. Then p can be thought 
of as an element of G too. 

(4.17) Proposition. Let G be a lattice group whose point group contains a rota¬ 
tion p through the angle 7t/2. Choose coordinates so that the origin is a point of ro¬ 
tation by 7t/2 in G. Let a be a shortest vector in L = Lg ? let ^ = p(a), and let 
c = \{a + b). Denote by r the reflection about the line spanned by a. Then G is 
generated by one of the following sets; {b ， p }， {t a ,p, r), {t a ,pjcr}. Thus there are 
three such groups. 

Proof. We first note that L is a square lattice, generated by a and b. For, a is 
in L by hypothesis, and Proposition (4.10) asserts that b = p(a) is also in L. These 
two vectors generate a square sublattice L f of L. If L ^ L\ then according to 
Lemma (4.14) there is an element w E L in the square whose vertices are 
Q y a,a + b and which is not one of the vertices. But any such vector would be at a 
distance less than | a | from at least one of the vertices v, and the difference w — v 
would be in L but shorter than a, contrary to the choice of a. Thus L = L f , as 
claimed. 

Now the elements t a and p are in G，and pt a p~ x = h (2.5). So the subgroup H 

of G generated by the set {t a ,p\ contains t a and Hence it contains t w for every 

» 

w G L. The elements of this group are the products t w p l : 

H = {t w p l I w G L, 0 < / < 3}. 

This is one of our groups. We now consider the possible additional elements which 
G may contain. 

Case 1: Every element of G preserves orientation. In this case, the point group is 
C 4 . Every element of G has the form m = t u pe, and if such an element is in G then 

丨 琴 

pe is in the point group, So pe = p l for some i, and mp~ l = t u E ： G too. Therefore 
u E ： L, and m G H. So G = H in this case. 

Case 2: G contains an orientation-reversing motion. In this case the point group is 
D 47 and it contains the reflection about the line spanned by a. We choose coordi¬ 
nates so that this reflection becomes our standard reflection r. Then r will be repre¬ 
sented in G by an element of the form m = t u r. 

Case 2a: The_element m is in L; that is ? t u ^ G. Then r E G too, so G contains its 
point group G = D 4 . If m f = t w p$ or if t w p$r is any element of G, then per is in G 
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Proof. If v E L and u E u 9 then t v and t u r are in G; hence t v t u r — 
t v+u r E G. This shows that c + v E ： U and proves (a). Next，suppose that u E ： U. 
Then pt u rp = t pu prp = t pu r E G. This shows that pu G U and proves (b). 
Finally, if u,v E U ， then t u rt v r = t u +rv ^ G\ hence m + ru E which 
proves (c). □ 

Part (a) of the lemma allows us to choose an element u E ： U lying in the 
square whose vertices are Q ， a ， b，a + b and which is not on the line segments 
[a, a + b] and [b, a + b]. We write u in terms of the basis {a, b), say u = xa + yb ， 
where 0 ^ x,y < 1. Then u + ru = 2xa. Since u ru E L by (4.18c)，the possi¬ 
ble values for x are 0, ^ Next ，pu + a = (1 — y)a + xb lies in the square too, and 
the same reasoning shows that y is 0 or |. Thus the three possibilities for u are \a, 
and {{a + b) = c. But if m = \a^ then pu = and ru = u = \a. So 
c = 士 (<3 + fe) E L (4.18b,c). This is impossible because c is shorter than a. Simi¬ 
larly, the case u = is impossible. So the only remaining case is m = c, which 

means that the group G is generated by {t a ， p ， t c r}. □ 

5 . ABSTRACT SYMMETRY: GROUP OPERATIONS 

The concept of symmetry may be applied to things other than geometric figures. For 
example, complex conjugation (a + bi)^^(a — bi) may be thought of as a sym¬ 
metry of the complex numbers. It is compatible with most of the structure of C: If a 
denotes the complex conjugate of a, then a + /3 = a + /3 and a/3 = a/3. Being 
compatible with addition and multiplication, conjugation is called an automorphism 
of the field C. Of course, this symmetry is just the bilateral symmetry of the com¬ 
plex plane about the real axis，but the statement that it is an automorphism refers to 
its algebraic structure. 

Another example of abstract “bilateral” symmetry is given by a cyclic group H 
of order 3, We saw in Section 3 of Chapter 2 that this group has an automorphism (p, 
which interchanges the two elements different from the identity. 

The set of automorphisms of a group H (or of any other mathematical structure 
H) forms a group AutH, the law of composition being composition of maps. Each 
automorphism should be thought of as a symmetry of H, in the sense that it is a per¬ 
mutation of the elements of H which is compatible with the structure of H. But in- 


too; hence t w G G, and w E L. Therefore G is the group generated by the set 

fc ， p ， 十 

Case 2b: The element u is not in L. This is the hard case. 

(4.18) Lemma. Let U be the set of vectors u such that t u r E G. Then 
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stead of being a geometric figure with a rigid shape, the structure in this case is the 
group law. The group of automorphisms of the cyclic group of order 3 contains two 
elements: the identity map and the map 

So the words automorphism and symmetry are more or less synonymous, ex¬ 
cept that automorphism is used to describe a permutation of a set which preserves 
some algebraic structure, while symmetry often refers to a permutation which pre¬ 
serves a geometric structure. 

These examples are special cases of a more general concept, that of an opera¬ 
tion of a group on a set. Suppose we are given a group G and a set S. An operation 
of G on S is a rule for combining elements g E G and s E ： S to get an element gs of 
S. In other words, it is a law of composition, a map Gx S - > S y which we gener¬ 

ally write as multiplication: 


g ， gs. 

This rule is required to satisfy the following axioms: 

(5.1) 

(a) Is = s for all s (1 is the identity of G). 

(b) Associative law: {gg f )s = g(g f s), for all g, g f E G and 5 G S. 

A set S with an operation of G is often called a G-set, This should really be 
called a left operation ，because elements of G multiply on the left* 

Examples of this concept can be found many where. For example ，let G = M 
be the group of all rigid motions of the plane. Then M operates on the set of points 
of the plane, on the set of lines in the plane, on the set of triangles in the plane, and 
so on. Or let G be the cyclic group {1, r} of order 2, with r 2 = 1. Then G operates 
on the set S of complex numbers, by the rule ra = a. The fact that the axioms (5,1) 
hold in a given example is usually clear. 

The reason that such a law of composition is called an operation is this: If we 
fix an element g of G but let s E S vary，then left multiplication by g defines a map 
from S to itself; let us denote this map by m gt Thus 

(5.2) m g :S — >S 
is defined by 

m g (s) = gs. 

This map describes the way the element g operates on S. Note that m g is a permuta¬ 
tion of S; that is, it is bijective. For the axioms show that it has the two-sided inverse 

m g i = multiplication by g -1 : 

m g-i( m g( s )) = g ] (8 s ) = (8 ] 8) s = Is = s. Interchanging the roles of g and g~ l 
shows that = s too. 
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The main thing that we can do to study a set S on which a group G operates is 
to decompose the set into orbits. Let s be an element of S. The orbit of .s in 5 is the 
set 

(5.3) O s = {s f E S I G gs for some g G G}. 

It is a subset of S. (The orbit is often written as Gs = {gs | g E G} ? in analogy with 
the notation for cosets [Chapter 2 (6.1)]. We won’t do this because Gs looks too 
much like the notation for a stabilizer which we are about to introduce.) If we think 
of elements of G as operating on S by permutations, then O s is the set of images of s 
under the various permutations m g . Thus y if G = M is the group of motions and S is 
the set of triangles in the plane，the orbit Oa of a given triangle A is the set of all 
triangles congruent to A. Another example of orbit was introduced when we proved 
the existence of a fixed point for the operation of a finite group on the plane (3.1). 
The orbits for a group action are equivalence classes for the relation 

(5.4) s ~ s f if s f — gs for some g E ： G. 

The proof that this is an equivalence relation is easy, so we omit it; we made a simi¬ 
lar verification when we introduced cosets in Section 6 of Chapter 2. Being equiva¬ 
lence classes，the orbits partition the set S: 

(5.5) S is a union of disjoint orbits. 

The group G operates on S by operating independently on each orbit. In other words ， 
an element g E ： G permutes the elements of each orbit and does not carry elements 
of one orbit to another orbit. For example, the set of triangles of the plane can be 
partitioned into congruence classes, the orbits for the action of M. A motion m per¬ 
mutes each congruence class separately. Note that the orbits of an element s and of 
供 are equal • 

If S consists of just one orbit，we say that G operates transitively on S. This 
means that every element of S is carried to every other one by some element of the 
group. Thus the group of symmetries of Figure (1.7) operates transitively on the set 
of its legs. The group M of rigid motions of the plane operates transitively on the set 
of points of the plane, and it operates transitively on the set of lines in the plane. It 
does not operate transitively on the set of triangles in the plane. 

The stabilizer of an element s ^ Sis the subgroup G s of G of elements leaving 
s fixed: 

(5.6) G s = {g G G I 斟 =■?}. 

It is clear that this is a subgroup. Just as the kernel of a group homomorphism 

<p: G - >G f tells us when two elements x,y G G have the same image, namely, if 

x~ l y G ker (p [Chapter 2 (5,13)]，we can describe when two elements x,y G G act 
in the same way on an element s E ： S in terms of the stabilizer G s : 
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(5.7) xs = ys if and only ifx~ l y E G s . 

For xs = ys implies s = x^ l ys, and conversely ， 

As an example of a nontrivial stabilizer, consider the action of the group M of 
rigid motions on the set of points of the plane. The stabilizer of the origin is the sub¬ 
group O of orthogonal operators. 

Or ? if S is the set of triangles in the plane and A is a particular triangle which 
happens to be equilateral, then the stabilizer of A is its group of symmetries, a sub¬ 
group of M isomorphic to D 3 (see (3.4)). Note that when we say that a motion m sta¬ 
bilizes a triangle A, we don’t mean that m fixes the points of A. The only motion 
which fixes every point of a triangle is the identity. We mean that in permuting the 
set of triangles, the motion carries A to itself. It is important to be clear about this 
distinction. 


6. THE OPERATION ON COSETS 


Let // be a subgroup of a group G, We saw in Section 6 of Chapter 2 that the left 
cosets aH = {ah \ h E ： H} form a partition of the group [Chapter 2 (6,3)]. We will 
call the set of left cosets the coset space and will often denote it by G/H, copying 
this notation from that used for quotient groups when the subgroup is normal. 

The fundamental observation to be made is this: Though G/H is not a group 
unless the subgroup H is normal，nevertheless G operates on the coset space G/H in 
a natural way. The operation is quite obvious: Let g be an element of the group, and 
let C be a coset. Then gC is defined to be the coset 

(6.1) gC = {gc \ c E C}. 


Thus if C = aH ， then gC is the coset gaH. It is clear that the axioms (5.1) for an 
operation are satisfied. 

Note that the group G operates transitively on G/H, because G/H is the orbit 
of the coset 1H = H. The stabilizer of the coset 1H is the subgroup H C G t Again, 
note the distinction; Multiplication by an element h E ： H does not act trivially on the 
elements of the coset \H y but it sends that coset to itself ， 

To understand the operation on cosets, you should work carefully through the 
following example. Let G be the group D 3 of symmetries of an equilateral triangle. 
As in (3.6), it may be described by generators x,y satisfying the relations x 3 二 1 ， 
y 2 = \, yx = x 2 y. Let // 二 {l ， y}* This is a subgroup of order 2. Its cosets are 

( 6 . 2 ) Ci = H = {l,y}, C 2 = {x,xy}, C 3 = {x 2 , x 2 y}, 

and G operates on G/H = {Ci, C 2? C 3 }. So, as in (5.2)，every element g of G deter¬ 
mines a permutation m g of {Ci ， C 2 , C 3 }. The elements x y y operate as 


(6.3) 



m 


V.3 


J and m y :l 2 3 
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In fact, the six elements of G yield all six permutations of three elements，and so the 
map 

G — >S 3 - Perm(G///) 

is an isomorphism. Thus the dihedral group G = Z ) 3 is isomorphic to the symmetric 
group S 3 . We already knew this. 

The following proposition relates an arbitrary group operation to the operation 
on cosets: 

(6.4) Proposition. Let 5 be a G-set ? and let s be an element of S. Let H be the 
stabilizer of s, and let O s be the orbit of s. There is a natural bijective map 

g/h 

defined by 

aH^^as. 

This map is compatible with the operations of G in the sense that <p (gC) = g<p (C)- 
for every coset C and every element g E ： G. 

The proposition tells us that every group operation can be described in terms of 
the operations on cosets. For example, let S = {vi, v 2 ? ^ 3 } be the set of vertices of an 
equilateral triangle，and let G be the group of its symmetries, presented as above* 
The element y is a reflection which stabilizes one of the vertices of the triangle, say 
V\. The stabilizer of this vertex is H = {l ， y}，and its orbit is S. With suitable index¬ 
ing, the set (6.2) of cosets maps to S by the map C, a^aa> U( # 

Proof of Proposition (6.4). It is clear the map ip, if it exists, will be compatible 
with the operation of the group. What is not so clear is that the rule gH^^gs 
defines a map at alL Since many symbols gH represent the same coset, we must 
show that if a and b are group dements and if aH = bH ， then as = bs too. This is 
true, because we know that aH = bH if and only if b = ah for some h in H 
[Chapter 2 (6.5)]. And when b = ah ， then bs = ahs = as, because h fixes s. Next ， 
the orbit of s consists of the elements gs, and <p carries gH to gs. Thus <p maps G/H 
onto Os, and <p is surjective. Finally，we show that <p is injective* Suppose aH and 
bH have the same image: as = bs. Then s = a~ l bs. Since H was defined to be the 
stabilizer of s y this implies that a~ { b — h E H. Thus b = ah E ： aH ， and so 
aH = bH, This completes the proof. □ 

(6.5) Proposition. Let 5 be a G-set, and let s E S. Let 〆 be an element in the 
orbit of s, say s f = as. Then 

(a) The set of elements g of G such that gs = s r is the left coset 

aG s = {g E ： G \ g = ah for some h E G s }^ 
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(b) The stabilizer of 丨 'is a conjugate subgroup of the stabilizer of s\ 

G s f = aG s a 一 1 = {g E G \ g = aha 1 for some h E G s }* 

We omit the proof. □ 

As an example, let us recompute the stabilizer of a point p in the plane, for the 
operation of the group of motions. We have made this computation before, in 
(2.11b). We have p = t p (0 ) ? and the stabilizer of the origin is the orthogonal group 
O, Thus by (6.5b )， 

G p = t p Ot p -' = tpOt-p = {m E M I m = t p pet- p or m = t p pert- p }. 

We know on the other hand that G p consists of rotations and reflections about the 
point p. Those are the motions fixing p. So t p Ot p ~ x consists of these elements, This 
agrees with (2.11), 


Z THE COUNTING FORMULA 


Let // be a subgroup of G. As we know from Chapter 2 (6.9), all the cosets of H in 
G have the same number of elements: | //1 = | aH\. Since G is a union of nonover¬ 
lapping cosets and the number of cosets is the index ? which we write as [G :H] or 
\G/H\, we have the fundamental formula for the order |G| of the group G (see 
[Chapter 2 (6.10)]): 

(7.1) \G\ = \H\\G/H\. 

Now let 5 be a G-set. Then we can combine Proposition (6,4) with (7.1) to get 
the following; 

(7.2) Proposition. Counting Formula: Let s E S. Then 

(order of G) = (order of stabilizer)(order of orbit) 

|0| = |a||a|. 

Equivalently, the order of the orbit is equal to the index of the stabilizer: 

\O s \ = [G : Gsl 

There is one such equation for every s E ： S. As sl consequence, the order of an orbit 
divides the order of the group. 

A more elementary formula uses the partition of S into orbits to count its ele¬ 
ments .We label the different orbits which make up S in some way, say as Oi ，..， ， 
O k . Then 

(7.3) \S\ = |(9i| + IO 2 1 + * - - + 1^1* 

These simple formulas have a great number of applications. 
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(7.4) Example. Consider the group G of orientation-preserving symmetries of a 
regular dodecahedron D. It follows from the discussion of Section 8 of Chapter 4 
that these symmetries are all rotations. It is tricky to count them without error - Con¬ 
sider the action of G on the set S of the faces of D. The stabilizer of a face s is the 
group of rotations by multiples of 2tt/5 about a perpendicular through the center of 
s. So the order of Gy is 5. There are 12 faces, and G acts transitively on them. Thus 
IGI = 5 * 12 = 60. Or, G operates transitively on the vertices v of D. There are 
three rotations，including 1， which fix a vertex ，so \G V \ = 3. There are 20 vertices; 
hence | G | = 3 * 20 = 60, which checks. There is a similar computation for edges. 
If e is an edge, then |G^| = 2, so since 60 = 2 • 30， the dodecahedron has 30 
edges. 


Following our general principle, we should study restriction of an operation of 
a group G to a subgroup. Suppose that G operates on a set S, and let // be a subgroup 
of G. We may restrict the operation, to get an operation of H on S. Doing so leads to 
more numerical relations. 

Clearly，the //-orbit of an element s will be contained in its G-orbit. So we 
may take a single G-orbit and decompose it into //-orbits. We count the orders of 
these //-orbits, obtaining another formula. For example, let S be the set of 12 faces 
of the dodecahedron, and let H be the stabilizer of a particular fece s. Then H also 
fixes the fece opposite to s, and so there are two //-orbits of order 1. The remaining 
faces make up two orbits of order 5. In this case ，（ 7,3) reads as follows. 

12 = 1 + 1 + 5 + 5. 

Or let S be the set of faces, and let K be the stabilizer of a vertex* Then K does not 
fix any face，so every 尤 -orbit has order 3: 

12 = 3 + 3 + 3 + 3. 

These relations give us a way of relating several subgroups of a group. 

We close the section with a simple application of this procedure to the case that 
the G-set is the coset space of a subgroup: 

(7.5) Proposition. Let H and ^ be subgroups of a group G. Then the index of 
H 0 in // is at most equal to the index of K in G: 

[H\ H H /C] < [G : K]. 
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Proof. To minimize confusion, let us denote the coset space G/K by S, and the 
coset \K by s. Thus \S\ = [G : K]. As we have already remarked，the stabilizer of 
5 is the subgroup K. We now restrict the action of G to the subgroup H and decom¬ 
pose S into //-orbits. The stabilizer of s for this restricted operation is obviously 
H H K. We don’t know much about the //-orbit O of s except that it is a subset of S. 
We now apply Proposition (7.2), which tells us that | (91 = [//:// Pi K], Therefore 
H : H Pi A^] = |0| < |S| = [G : as required .口 

8. PERMUTATION REPRESENTATIONS 

By its definition, the symmetric group S n operates on the set S = { 1 ， …， n}. A per¬ 
mutation representation of a group G is a homomorphism 

(8_1) (p: G - >S n . 

Given any such representation, we obtain an operation of G on S = {1,.• •, n} by let¬ 
ting m g (5.2) be the permutation <p(g). In fact, operations of a group G on { 1 ， …， n} 
correspond in a bijective way to permutation representations. 

More generally, let S be any set，and denote by Perm (S) the group of its per¬ 
mutations. Let G be a group, 

(8,2) Proposition. There is a bijective correspondence 


operations 

4 — ^ 

homomorphisms 

of G on 5 


lG - > Perm (S) \ 


defined in this way: Given an operation, we define <p: G - >Perm(5) by the rule 

<p(g) = m g ， where m g is multiplication by g (5.2). 

Let us show that is a homomorphism，leaving the rest of the proof of (8.2) as 
an exercise. We’ve already noted in Section 5 that m g is a permutation. So as defined 
above, <p(g) E Perm(S). The axiom for a homomorphism is (p{xy) = or 

mxy — m x m y ， where multiplication is composition of permutations• So we have to 
show that nixy{s) = m x (my(s)) for every s E S. By Definition (5.2 )， m^s) = (jcy)^ 
and m x (m y (s)) = x (y,s). The associative law (5.1b) for group operations shows that 
(xy)s = x(ys), as required. □ 

The isomorphism D 3 - >Ss obtained in Section 6 by the action of D 3 on the 

cosets of H (6.2) is a particular example of a permutation representation. But a ho¬ 
momorphism need not be injective or surjective. If <p: G ^ Perm ⑸ happens to be 

injective, we say that the corresponding operation is faithful. So to be faithful, the 
operation must have the property that m g ♦ identity, unless g = 1 ， or 

if gs = s for every s S S, then g = 1. 

The operation of the group of motions M on the set S of equilateral triangles in the 
plane is faithful，because the identity is the only motion which fixes all triangles. 
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The rest of this section contains a few applications of permutation representa¬ 
tions. 

(8.3) Proposition. The group GL 2 (F 2 ) of invertible matrices with mod 2 
coefficients is isomorphic to the symmetric group S 3 . 

Proof. Let us denote the field F 2 by F, and the group GL 2 (F 2 ) by G. We 
have listed the six elements of G before [Chapter 3 (2.10)]. Let V = F 2 be 
the space of column vectors. This space consists of the following four vectors: 

V = {0 ? € 1 , 62^1 + e 2 }. The group G operates on V and fixes 0, so it operates on the 
set of three nonzero vectors, which form one orbit. This gives us a permutation rep¬ 
resentation (p: G - > 83 . Now the image of e\ under multiplication by a matrix 

P E G is the first column of P，and similarly the image of e 2 is the second column 
of P. Therefore P can not operate trivially on these two elements unless it is the 
identity. This shows that the operation of G is faithful，and hence that the map <p is 
injective. Since both groups have order 6 , <p is an isomorphism. □ 

(8.4) Proposition* Let c g denote conjugation by g, the map c 8 (x) = gxg~\ The 

map /: S 3 - > Aut(5 3 ) from the symmetric group to its group of automorphisms 

which is defined by the rule g c g is bijective. 

Proof. Let A denote the group of automorphisms of S 3 . We know from Chapter 
2 (3.4) that Cg is an automorphism. Also, c g h = c g ch because c g h{x) — 
( 妨 ) 又（冰 ) 叫 = ghxh~ l g ^ 1 = c g (ch(x)) for all x. This shows that / is a homomor¬ 
phism. Now conjugation by g is the identity if and only if g is in the center of the 
group. The center of S 3 is trivial ， so/is injective. 

It is to prove surjectivity of / that we look at a permutation representation of 
A. The group A operates on the set ^3 in the obvious way; namely, if a is an auto¬ 
morphism and s E S 3 , then as — a (s). Elements of S 3 of different orders will be in 
distinct orbits for this operation. So A operates on the subset of S 3 of elements of or¬ 
der 2, This set contains the three elements {y,xy,x 2 y}. If an automorphism a fixes 
both xy and y, then it also fixes their product xyy = x. Since x and y generate S 3 , 
the only such automorphism is the identity. This shows that the operation of A on 
{y, xy, x 2 y] is faithful and that the associated permutation representation 
A - > Perm{y, xy, x 2 y] is injective. So the order of A is at most 6 . Since/is injec¬ 

tive and the order of 5 3 is 6 , it follows that / is bijective. □ 

(8.5) Proposition. The group of automorphisms of the cyclic group of order p is 
isomorphic to the multiplicative group f p x of nonzero elements of ¥ p . 

Proof • The method here is to use the additive group F /? + as the model for a 
cyclic group of order /?. It is generated by the element 1 . Let us denote the multi¬ 
plicative group ¥ p x by G. Then G operates on F^ + by left multiplication, and this 

operation defines an injective homomorphism (p: G - ^Perm(F /? ) to the group of 

permutations of the set f p of p elements. 
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Next, the group A = Aut(F p +) of automorphisms is a subgroup of Perm(F p +), 
The distributive law shows that multiplication by an element a E ¥ p x is an automor¬ 
phism of It is bijective，and a{x + y) = ax + ay. Therefore the image of 

<p: G - >Perm(F p + ) is contained in the subgroup A. Finally, an automorphism of 

Fp + is determined by where it sends the generator 1， and the image of 1 can not be 
zero. Using the operations of G, we can send 1 to any nonzero element. Therefore (p 
is a surjection from G onto A. Being both injective and surjective, (p is an isomor¬ 
phism, □ 

9. FINITE SUBGROUPS OF THE ROTATION GROUP 

In this section，we will apply the Counting Formula to classify finite subgroups of 
the rotation group SO 3 , which was defined in Chapter 4 (5.4). As happens with finite 
groups of motions of the plane, there are rather few finite subgroups of SO 3 , and all 
of them are symmetry groups of familiar figures. 

(9.1) Theorem. Every finite subgroup G of SO3 is one of the following; 

Ck' the cyclic group of rotations by multiples of lir/k about a line; 

D k ： the dihedral group (3.4) of symmetries of a regular 灸 -gon; 

T: the tetrahedral group of twelve rotations carrying a regular tetrahedron to 
itself; 

O : the octahedral group of order 24 of rotations of a cube，or of a regular 
octahedron; 

I : the icosahedral group of 60 rotations of a regular dodecahedron or a regular 
icosahedron: 

We will not attempt to classify the infinite subgroups. 

Proof. Let G be a finite subgroup of SO3, and denote its order by n. Every ele¬ 
ment g of G except the identity is a rotation about a line €， and this line is obviously 
unique. So g fixes exactly two points of the unit sphere S in R 3 , namely the two 
points of intersection £ H S. We call these points the poles of g. Thus a pole is a 
point p on the unit sphere such that gp = p for some element g ^ 1 of G. For ex¬ 
ample, if G is the group of rotational symmetries of a tetrahedron A, then the poles 
will be the points of S lying over the vertices，the centers of faces, and the centers of 
edges of A. 
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Let P denote the set of all poles. 

(9.2) Lemma The set P is carried to itself by the action of G on the sphere. So G 
operates on P. 

Proof. Let p be a pole, say the pole of g G G. Let x be an arbitrary ele¬ 
ment of G. We have to show that xp is a pole, meaning that xp is left fixed by 
some element g f of G other than the identity. The required element is xgx ~ 1 : 
xgx~ l {xp) — xgp = xp, andx^x" 1 ^ 1 because ^ ^ 1, □ 

We are now going to get information about the group by counting the poles. 
Since every element of G except 1 has two poles, our first guess might be that there 
are 2n ~ 2 poles altogether. This isn’t quite correct，because the same point p may 
be a pole for more than one group element. 

The stabilizer of a pole p is the group of all of the rotations about the line 
€ = (0,p) which are in G. This group is cyclic and is generated by the rotation of 
smallest angle 6 in G. [See the proof of Theorem (3,4a).] If the order of the stabi¬ 
lizer is r p , then 6 = 2 tt/ r p . 

We know that r p > \ because，since /? is a pole, the stabilizer G p contains an 
element besides L By the Counting Formula (7.2), 

\G P \ \O p \ = |G|. 

We write this equation as 

(9.3) r p n p = n ， 

where n p is the number of poles in the orbit O p of p. 

The set of elements of G with a given pole p is the stabilizer G p ， minus the 
identity element. So there are (r p — 1) group elements with p as pole. On the other 
hand, every group element g except 1 has two poles. Having to subtract 1 every¬ 
where is a little confusing here，but the correct relation is 

( 9 . 4 ) Yj ( r p - \) = 2 n - 2. 

pep 

Now if p and/?' are in the same orbit，then the stabilizers G p and G p ， have the 
same order. This is because O p = O p ^ and | G | = \G P \ \O p \ = \G P ^\ \ O p ^ \ . There¬ 
fore we can collect together the terms on the left side of (9.4) which correspond 
to poles in a given orbit O p . There are n p such terms，so the number of poles col- 
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lected together is n p (r p — 1), Let us number the orbits in some way，as Oi’O〕， •… 
Then 


S n i( r i — 1) = 2/v — 2 ， 

琴 

l 

where nf = | <9i | ， and n = \G P \ for any p E ： Oi. Since n = mn ， we can divide both 
sides by /v and switch sides，to get the femous formula 

( 9 . 5 ) 2~l=2(l~y). 

This formula may not look very promising at first glance, but actually it tells us a 
great deal. The left side is less than 2, while each term on the right is at least \. It 
follows that there can be at most three orbits! 

The rest of the classification is made by listing the various possibilities: 


One orbit: 


2 ~l =l 


1 2 
? This is impossible, because 2 — - > 1, while 



Two orbits: 




2 




以， thatis ，/ 




r 2 


We know that n < n, because n divides n. This equation can hold only if 
r\ = r 2 = n. Thus m — n 2 = l. There are two poles p ， p f ， both fixed by every ele¬ 
ment of the group. Obviously, G is the cyclic group C N of rotations about the line i 
through p and 〆 • 


Three orbits: This is the main case: Formula (9.5) reduces to 


2 111 
— _ _ ■ - - 

N r\ r 2 n 


L 


We arrange the n in increasing order. Then n = 2. For if all n were at least 3, then 
the right side would be ^ 0, which is impossible. 

Case 1: At least two of the orders r\ are 2: r x = r 2 — 2. The third order r 3 = r can 
be arbitrary，and N = 2r. Then n 3 = 2: There is one pair of poles {p,p f } making 
the orbit 0 3 . Every element g either fixes p and p f or interchanges them. So the ele¬ 
ments of G are rotations about € = (p, p f ) 9 or else they are rotations by tt about a 
line £ f perpendicular to It is easily seen that G is the group of rotations fixing a 
regular r-gon A，the dihedral group D r . The polygon A lies in the plane perpendicu¬ 
lar to and the vertices and the centers of faces of A corresponding to the remain¬ 
ing poles. The bilateral (reflection) symmetries of the polygon in U 2 have become 
rotations through the angle tt when A is put into [R 3 . 
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Case 2: Only one r ； is 2: The triples r\ = 2, > 4, r 3 > 4 are impossible, because 

1/2 + 1/4 + 1/4—1 = ()• Similarly, n = 2 ， r 2 = 3 ， r 3 > 6 can not occur be¬ 
cause 1/2 + 1/3 + 1/6 —1=0, There remain only three possibilities: 


(9.6) 



: (2,3,3), w = 

:12; 

(ii) n = 

=(2,3,4), n = 

= 24; 

(iii) r/ = 

=(2, 3, 5)，w : 

r 60, 


It remains to analyze these three cases. We will indicate the configurations 
briefly. 

(9.7) 

(i) m = (6, 4, 4), The poles in the orbit 0 2 are the vertices of a regular tetrahe¬ 
dron A, and G is the group of rotations fixing it: G 二 T. Here tu is the number 
of edges of A, and n 2 , m are the numbers of vertices and faces of A, 

(ii) m = (12, 8, 6). The poles in 0 2 are the vertices of a cube，and the poles in O 3 
are the vertices of a regular octahedron, G = O is the group of their rotations. 
The integers m are the numbers of edges, vertices, and faces of a cube. 

(iii) m = (30, 20 , 12 ). The poles of 0 2 are the vertices of a regular dodecahedron ， 
and those in O 3 are the vertices of a regular icosahedron: G = L 

There is still some work to be done to prove the assertions of (9.7), Intu¬ 
itively, the poles in an orbit should be the vertices of a regular polyhedron because 
they form a single orbit and are therefore evenly spaced on the sphere. However this 
is not quite accurate, because the centers of the edges of a cube, for example, form a 
single orbit but do not span a regular polyhedron, (The figure they span is called a 
truncated polyhedron,) 

As an example, consider (9.7iii). Let p be one of the 12 poles in and let q 
be one of the poles of O2 nearest to p. Since the stabilizer of p is of order 5 and op¬ 
erates on 0 2 (because G does), the images of q provide a set of five nearest neighbors 
to p ， the poles obtained from q by the five rotations about p in G. Therefore the 
number of poles of O2 nearest to p is a multiple of 5, and it is easily seen that 5 is the 
only possibility. So these five poles are the vertices of a regular pentagon. The 12 
pentagons so defined form a regular dodecahedron, □ 

We close this chapter by remarking that our discussion of the motions of the 
plane has analogues for the group M 3 of rigid motions of 3-space* In particular, one 
can define the notion of crystallographic group ， which is a discrete subgroup whose 
translation group is a three-dimensional lattice L. To say that L is a lattice means 
that there are three linearly independent vectors a,b,c in W such that 
ta.tbjc, E G. The crystallographic groups are analogous to lattice groups in 
M = and crystals form examples of three-dimensional configurations having 
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such groups as symmetry. We imagine the crystal to be infinitely large- Then the fact 
that the molecules are arranged regularly implies that they form an array having 
three independent translational symmetries. It has been shown that there are 230 
types of crystallographic groups，analogous to the 17 lattice groups (4-15). This is 
too long a list to be very useful, and so crystals have been classified more crudely 
into seven crystal systems• For more about this, and for a discussion of the 32 crys¬ 
tallographic point groups，look in a book on crystallography. 


Un bon heritage vaut mieux que le plus joli probleme de geometric 9 

parce qu’il tient lieu de methode generate, 
et serf a resoudre bien des problemes • 

Gottfried Wilhelm Leibnitz 


EXERCISES 

h Symmetry of Plane Figures 

1. Prove that the set of symmetries of a figure F in the plane forms a group. 

2. List all symmetries of (a) a square and (b) a regular pentagon. 

3. List all symmetries of the following figures. 

⑻ （ 1.4) (b) (L5) (c) (L6) (d)(1.7) 

4. Let G be a finite group of rotations of the plane about the origin. Prove that G is cyclic. 

2. The Group of Motions of the Plane 

1. Compute the fixed point of t a pe algebraically. 

2. Verify the rules (2,5) by explicit calculation, using the definitions (2.3). 

3. Prove that O is not a normal subgroup of M. 

4. Let m be an orientation-reversing motion. Prove that m 2 is a translation. 

5* Let SM denote the subset of orientation-preserving motions of the plane. Prove that SM 
is a normal subgroup of M ，and determine its index in M. 

6* Prove that a linear operator on IR 2 is a reflection if and only if its eigenvalues are 1 and 
一1 ， and its eigenvectors are orthogonal. 

7. Prove that a conjugate of a reflection or a glide reflection is a motion of the same type ， 
and that if /n is a glide reflection then the glide vectors of m and of its conjugates have 
the same length. 

8* Complete the proof that (2.13) is a homomorphism. 

9* Prove that the map M - defined by t a per^^r is a homomor¬ 

phism. 

10* Compute the effect of rotation of the axes through an angle jj on the expressions t a pe and 
taper for a motion. 
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11, (a) Compute the eigenvalues and eigenvectors of the linear operator m = per. 

(b) Prove algebraically that m is a reflection about a line through the origin, which sub¬ 
tends an angle of \6 with the x-axis. 

(c) Do the same thing as in (b) geometrically. 

12, Compute the glide vector of the glide t a per in terms of a and 6. 

13, (a) Let m be a glide reflection along a line L Prove geometrically that a point x lies on € 

if and only if x , m(x) ， m 2 (x) are colinear. 

(b) Conversely, prove that if m is an orientation-reversing motion and x is a point such 
that x 7 m(x) ? m 2 (x) are distinct points on a line €, then m is a glide reflection 
along 

14, Find an isomorphism from the group SM to the subgroup of GL 2 (C) of matrices of the 

form ^ ' ， with \ a\ = l. 

15, (a) Write the formulas for the motions (2.3) in terms of the complex variable 

z = x + iy. 

(b) Show that every motion has the form m(z) — az + ^ or m(z) = az + where 
|a I = 1 and j8 is an arbitrary complex number. 

3. Finite Groups of Motions 

1. Let D n denote the dihedral group (3.6). Express the product x 2 yx~ l y~ l x 3 y 3 in the form 

x l y j in D n . 

2. List all subgroups of the group and determine which are normal. 

3. Find all proper normal subgroups and identify the quotient groups of the groups D\ 3 and 

Z)l5 , 

4. (a) Compute the cosets of the subgroup// = {1, x 5 } in the dihedral group D i0 explicitly. 

(b) Prove that D\ 0 /H is isomorphic to D s * 

(c) Is Dio isomorphic to D 5 x //? 

5. List the subgroups of G = D 6 which do not contain W = {l ? x 3 }. 

6. Prove that every finite subgroup of M is a conjugate subgroup of one of the standard sub¬ 
groups listed in Corollary (3.5). 

4. Discrete Groups of Motions 

1. Prove that a discrete group G consisting of rotations about the origin is cyclic and is gen¬ 
erated by pe where 6 is the smallest angle of rotation in G. 

2. Let G be a subgroup of M which contains rotations about two different points. Prove al¬ 
gebraically that G contains a translation. 

3. Let (a, b) be a lattice basis of a lattice L in R 2 . Prove that every other lattice basis has the 
form {a f ,b f ) — (a, b)P, where P is a 2 x 2 integer matrix whose determinant is ±L 

4. Determine the point group for each of the patterns depicted in Figure (4.16). 

5. (a) Let 5 be a square of side length a, and let e >0. Let 5 be a subset of B such that the 

distance between any two points of 5 is > e. Find an explicit upper bound for the 
number of elements in S. 

(b) Do the same thing for a box B in U n t 
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Determine the point group of G. 

12. Let G be the group of symmetries of an equilateral triangular lattice L . Find the index in 
G of the subgroup T D G. 

13* Let G be a discrete group in which every element is orientation-preserving. Prove that 
the point group G is a cyclic group of rotations and that there is^a point p in the plane 
such that the set of group elements which fix p is isomorphic to G. 

14. With each of the patterns shown，find a pattern with the same type of symmetry in 
(4.16). 




I 



6 . Prove that the subgroup of generated by 1 and V2 is dense in [R+ • 

1 • Prove that every discrete subgroup of O is finite. 

8 . Let G be a discrete subgroup of M. Prove that there is a point p 0 in the plane which is not 
fixed by any point of G except the identity, 

9. Prove that the group of symmetries of the frieze pattern 

...GGEGGGG6666... 

is isomorphic to the direct product C 2 X Coo of a cyclic group of order 2 and an infinite 
cyclic group. 

10. Let G be the group of symmetries of the frieze pattern ■•. L nr l hHhr l hH... 

(a) Determine the point group G of G. 

(b) For each element g E ： G, and each element g E. G which represents describe the 
action of g geometrically. 

(c) Let H be the subgroup of translations in G. Determine [G:H], 

11. Let G be the group of symmetries of the pattern 


<><><><><><>< 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 

<><><><><><>< 

><><><><><><> 


<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 

<><><><><><>< 
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<><><><><><>< 
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<><><><><><>< 
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15. Let N denote the group of rigid motions of the line € — [R 1 . Some elements of are 

t a :-+ a, a E [R ， x - > -x, 

(a) Show that {t a , t a s} are all of the elements of N, and describe their actions on € 
geometrically. 

(b) Compute the products t a tb, st aj ss, 

(c) Find all discrete subgroups of N which contain a translation* It will be convenient to 
choose your origin and unit length with reference to the particular subgroup. Prove 
that your list is complete. 

*16. Let N f be the group of motions of an infinite ribbon 

R = {(x,y) \-l^y 

It can be viewed as a subgroup of the group M. The following elements are in N r : 



+ a,y) 



r: (x,y)- 

—U ， y) 

(x,y)- 

—>( 一文， 一 >)• 


(a) Show that these elements generate N\ and describe the elements of W as products. 

(b) State and prove analogues of (2.5) for these motions. 

(c) A frieze pattern is any pattern on the ribbon which is periodic and not degenerate, in 
the sense that its group of symmetries is discrete. Since it is periodic, its group of 
symmetries will contain a translation. Some sample patterns are depicted in the text 
(1.3 ， 1.4, 1.6 ， 1.7). Classify the symmetry groups which arise，identifying those 
which differ only in the choice of origin and unit length on the ribbon • I suggest that 
you begin by trying to make patterns with different kinds of symmetry. Please make 
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a careful case analysis when proving your results. A suitable format would be as fol¬ 
lows: Let G be a discrete subgroup containing a translation. 

Case 1: Every element of G is a translation. Then …， 

Case 2: G contains the rotation p but no orientation-reversing symmetry. Then • • • ， 
and so on. 

*17, Let L be a lattice of M 2 , and let a, b be linearly independent vectors lying inL. Show that 
the subgroup L f = {ma + nb\m J n E Z } of L generated by a,b has finite index，and 
that the index is the number of lattice points in the parallelogram whose vertices are 
0 ? a,b 9 a + b and which are not on the “for edges” [a ， a + b] and [b, a + b\ (So, 0 is 
included, and so are points which lie on the edges [0, a] ， [0, Z?]，except for the points a, b 
themselves.) 

18. (a) Find a subset F of the plane which is not fixed by any motion m E M. 

(b) Let G be a discrete group of motions. Prove that the union S of all images of F by 
elements of G is a subset whose group of symmetries G f contains G. 

(c) Show by an example that G f may be larger than G. 

*(d) Prove that there exists a subset F such that G f = G. 

*19. Let G be a lattice group such that no element 客竽 1 fixes any point of the plane. Prove 
that G is generated by two translations, or else by one translation and one glide. 

*20. Let G be a lattice group whose point group is Di = {1, r}. 

(a) Show that the glide lines and the lines of reflection of G are all parallel. 

(b) LetL = Lg, Show that L contains nonzero vectors a — (^i,0) r ? b = (0 ， i>2 )、 

(c) Let a and b denote the smallest vectors of the type indicated in (b). Then either (a, b) 
or (a, c) is a lattice basis for L, where c = \{a + b). 

(d) Show that if coordinates in the plane are chosen so that the x—axis is a glide line, 

then G contains one of the elements g = r ox g = In either case，show that 
G = L U Lg. 2 

(e) There are four possibilities described by the dichotomies (c) and (d). Show that there 
are only three different kinds of group. 

21* Prove that if the point group of a lattice group G is C 6 , then L = Lg is an equilateral tri¬ 
angular lattice, and G is the group of all rotational symmetries of L about the origin. 

22. Prove that if the point group of a lattice group G is D 6 , then L = Lg is an equilateral tri¬ 
angular lattice, and G is the group of all symmetries of L. 

*23. Prove that symmetry groups of the figures in Figure (4,16) exhaust the possibilities. 

5. Abstract Symmetry: Group Operations 

1. Determine the group of automorphisms of the following groups. 

⑻ C 4 (b) C 6 (c) C 2 x C 2 

2« Prove that (5.4) is an equivalence relation. 

3* Let 5 be a set on which G operates. Prove that the relation s ~ s f if s f = gs for some 
g G G is an equivalence relation. 

4. Let <p: G - > G' be a homomorphism, and let 5 be a set on which G f operates. Show 

how to define an operation of G on S, using the homomorphism 
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5. Let G = Z) 4 be the dihedral group of symmetries of the square. 

(a) What is the stabilizer of a vertex? an edge? 

(b) G acts on the set of two elements consisting of the diagonal lines. What is the stabi¬ 
lizer of a diagonal? 

6. In each of the figures in exercise 14 of Section 4, find the points which have nontrivial 
stabilizers, and identify the stabilizers. 

*7. Let G be a discrete subgroup of M. 

(a) Prove that the stabilizer G p of a point p is finite. 

(b) Prove that the orbit O p of a point p is a discrete set, that is，that there is a number 
£ > 0 so that the distance between two distinct points of the orbit is at least 6. 

(c) Let B,B f be two bounded regions in the plane. Prove that there are only finitely 
many elements g S G so that gB D B r is nonempty. 

8* Let G = GL«([R) operate on the set S — U n by left multiplication. 

(a) Describe the decomposition of S into orbits for this operation. 

(b) What is the stabilizer of 心 ？ 

9, Decompose the set C 2X2 of 2 x 2 complex matrices for the following operations of 

gl 2 (C )： 

(a) Left multiplication 
*(b) Conjugation 

10. (a) Let S — [R mX/7 be the set of real m x n matrices, and let G = GL m ([R) x GL n (U). 

Prove that the rule (P, Q) y A^^PAQ~ l defines an operation of G on S. 

(b) Describe the decomposition of S into G-orbits. 

(c) Assume that m ^ n. What is the stabilizer of the matrix [/ |0]? 

r _ 

11. (a) Describe the orbit and the stabilizer of the matrix J. ^ under conjugation in 

GL n (U). L J 

(b) Interpreting the matrix in GL 2 (^ 3 ), find the order (the number of elements) of the 
orbit. 

12. (a) Define automorphism of a field, 

(b) Prove that the field O of rational numbers has no automorphism except the identity. 

(c) Determine Aut F, when F = Q[\/2], 

6* The Operation on Cosets 

1. What is the stabilizer of the coset aH for the operation of G on G/Hl 

2. Let G be a group, and let H be the cyclic subgroup generated by an element x of G. 
Show that if left multiplication by x fixes every coset of // in G，then // is a normal 
subgroup. 

3. (a) Exhibit the bijective map (6.4) explicitly, when G is the dihedral group D 4 and S is 

the set of vertices of a square. 

(b) Do the same for D n and the vertices of a regular n-gon. 

4. (a) Describe the stabilizer H of the index 1 for the action of the symmetric group G 二 S n 

on {1 ， …， n} explicitly. 

(b) Describe the cosets of // in G explicitly for this action. 

(c) Describe the map (6,4) explicitly. 
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5. Describe all ways in which & can operate on a set of four elements. 

6. Prove Proposition (6.5). 

7. A map S - >S r of G-sets is called a homomorphism of G- sets if <p(gs) == g<p(s) for all 

s G S and g E G. Let <p be such a homomorphism. Prove the following; 

(a) The stabilizer G^s) contains the stabilizer G s . 

(b) The orbit of an element s G S maps onto the orbit of <p (*y), 

Z The Counting Formula 

1* Use the counting formula to determine the orders of the group of rotational symmetries 
of a cube and of the group of rotational symmetries of a tetrahedron. 

2. Let G be the group of rotational symmetries of a cube C. Two regular tetrahedra A, A' 
can be inscribed in C, each using half of the vertices. What is the order of the stabilizer 
of A? 

3. Compute the order of the group of symmetries of a dodecahedron，when orientation- 
reversing symmetries such as reflections in planes, as well as rotations, are allowed. Do 
the same for the symmetries of a cube and of a tetrahedron. 

4. Let G be the group of rotational symmetries of a cube, let S e ,S v , Sf be the sets of ver¬ 
tices, edges, and faces of the cube, and let H v ， H e ，Hfbe the stabilizers of a vertex, an 
edge, and a fece. Determine the formulas which represent the decomposition of each of 
the three sets into orbits for each of the subgroups. 

5. Let G D H D K be groups. Prove the formula [G : = [G : H][H : K] without the 

assumption that G is finite. 

6. (a) Prove that if H and K are subgroups of finite index of a group G，then the intersec¬ 

tion H f) K is also of finite index. 

(b) Show by example that the index [H : H Pi 欠 ] need not divide [G : K]. 


8. Permutation Representations 

1. Determine all ways in which the tetrahedral group T (see (9*1)) can operate on a set of 
two elements. 

2« Let 5 be a set on which a group G operates, and let H = {g G G\gs = s for all s E 5}. 
Prove that // is a normal subgroup of G, 

3. Let G be the dihedral group of symmetries of a square. Is the action of G on the vertices 
a faithful action? on the diagonals? 

4. Suppose that there are two orbits for the operation of a group G on a set 5, and that they 
have orders m, n respectively. Use the operation to define a homomorphism from G to 
the product S m x S n of symmetric groups. 

5. A group G operates faithfiilly on a set S of five elements，and there are two orbits，one of 
order 3 and one of order 2. What are the possibilities for G? 

6. Complete the proof of Proposition (8.2). 

7. Let F = F 3 . There are four one-dimensional subspaces of the space of column vectors 
F 2 . Describe them. Left multiplication by an invertible matrix permutes these subspaces. 

Prove that this operation defines a homomorphism <p: GL 2 (F) - >S 4 , Determine the 

kernel and image of this homomorphism. 
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*8_ For each of the following groups, find the smallest integer n such that the group has a 
faithful operation on a set with n elements. 

(a) the quaternion group H (b) D 4 (c) 


9. Finite Subgroups of the Rotation Group 

1. Describe the orbits of poles for the group of rotations of an octahedron and of an 
icosahedron. 

2. Identify the group of symmetries of a baseball, taking the stitching into account and al¬ 
lowing orientation-reversing symmetries. 

3. Let O be the group of rotations of a cube. Determine the stabilizer of a diagonal line 
connecting opposite vertices. 

4. Let G — O be the group of rotations of a cube, and let H be the subgroup carrying one 
of the two inscribed tetrahedra to itself (see exercise 2, Section 7). Prove that H = T. 

5. Prove that the icosahedral group has a subgroup of order 10. 

6. Determine all subgroups of the following groups: 

(a) T (b) I 

7. Explain why the groups of symmetries of the cube and octahedron，and of the dodecahe¬ 
dron and icosahedron，are equal. 

*8_ (a) The 12 points (±1 ， ±a,0), (0, ±1, ±a)(±a,0, ±1) form the vertices of a regular 
icosahedron if a is suitably chosen. Verify this, and determine a . 

(b) Determine the matrix of the rotation through the angle 2rr/5 about the origin in U 2 . 

(c) Determine the matrix of the rotation of U 3 through the angle 2ir/5 about the axis 
containing the point (1,a ? 0). 

*9. Prove the crystallographic restriction for three-dimensional crystallographic groups: A 
rotational symmetry of a crystal has order 2, 3, 4, or 6. 

Miscellaneous Problems 

1. Describe completely the following groups: 

(a) Aut D 4 (b) Aut //, where H is the quaternion group 

2. (a) Prove that the set Aut G of automorphisms of a group G forms a group, 

(b) Prove that the map <p: G - > Aut G defined by (conjugation by g) is a homo¬ 

morphism, and determine its kernel. 

(c) The automorphisms which are conjugation by a group element are called inner auto¬ 
morphisms. Prove that the set of inner automorphisms, the image of <p 3 is a normal 
subgroup of Aut G. 

3* Determine the quotient group Aut H/lnt H for the quaternion group H. 

*4. Let G be a lattice group. A fundamental domain D for G is a bounded region in the 
plane，bounded by piecewise smooth curves, such that the sets gD ，^ G G cover the 
plane without overlapping except along the edges. We assume that D has finitely many 
connected components. 

(a) Find fundamental domains for the symmetry groups of the patterns illustrated in ex¬ 
ercise 14 of Section 4. 

(b) Show that any two fundamental domains D ， D’ for G can be cut into finitely many 
congruent pieces of the form gD Pi ' or Z) Pi(see exercise 7, Section 5), 
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(c) Conclude that D and D f have the same area. (It may happen that the boundary 
curves intersect infinitely often, and this raises some questions about the definition of 
area. Disregard such points in your answer,) 

*5* Let G be a lattice group, and let p 0 be a point in the plane which is not fixed by any ele¬ 
ment of G. Let S = {gpo I g E G} be the orbit of p 0 . The plane can be divided into 
polygons, each one containing a single point of S 9 as follows: The polygon A p containing 
p is the set of points q whose distance from p is the smallest distance to any point of S: 

^ E R 2 | dist(<y, p) ^ dist( 《， 〆）for a\\p f G S}. 

(a) Prove that A p is a polygon. 

(b) Prove that is a fundamental domain for G, 

(c) Show that this method works for all discrete subgroups of M，except that the domain 
A p which is constructed need not be a bounded set. 

(d) Prove that A p is bounded if and only if the group is a lattice group. 

*6. (a) Let G ; C G be two lattice groups. Let D be a fundamental domain for G. Show that 
a fundamental domain D r for G f can be constructed out of finitely many translates 
gD of D. 

(b) Show that [G : G ; ] < 00 and that [G : G f ] = area (D ; )/area (D). 

(c) Compute the index [G : Lc] for each of the patterns (4.16). 

*7. Let G be a finite group operating on a finite set 5. For each element g E G, let S 8 9 denote 
the subset of elements of S fixed by g: S g = {s G 5 | g5 = 5 }. 

(a) We may imagine a true—false table for the assertion that gs = s, say with rows in¬ 
dexed by elements of G and columns indexed by elements. Construct such a table for 
the action of the dihedral group D 3 on the vertices of a triangle. 

(b) Prove the formula 2 2 \S 8 * 

s^S g^G 

(c) Prove Burnside’s Formula: 

|G| • {number of orbits) = 2 _ 

g^G 

8 . There are 70 = ways to color the edges of an octagon, making four black and four 

white. The group D 8 operates on this set of 70, and the orbits represent equivalent color¬ 
ings. Use Burnside’s Formula to count the number of equivalence classes. 

9, Let G be a group of order n which operates nontrivially on a set of order r. Prove that if 
n > r! s then G has a proper normal subgroup. 
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The more to do or to prove，the easier the doing or the proof • 

James Joseph Sylvester 


L THE OPERATIONS OF A GROUP ON ITSELF 

By an operation of a group G on itself, we mean that in the definition of the opera¬ 
tion ? G plays the role both of the group and of the set on which it operates. Any 
group operates on itself in several ways, two of which we single out here. The first is 
left multiplication: 

(1.1) GxG ― >G 

gj X 

This is obviously a transitive operation of G on G, that is, G forms a single orbit ， 
and the stabilizer of any element is the identity subgroup {1}, So the action is faith¬ 
ful, and the homomorphism 

(1.2) G ― > Perm (G) 

= left multiplication by g 
defined in Chapter 5, Section 8 is injective. 

(1.3) Theorem. Cayley's Theorem: Every finite group G is isomorphic to a sub¬ 
group of a permutation group. If G has order n, then it is isomorphic to a subgroup 
of the symmetric group S n . 

Proof. Since the operation by left multiplication is faithful, G is isomorphic to 
its image in Perm (G). If G has order n, then Perm (G) is isomorphic to S n . □ 
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Though Cayley’s Theorem is intrinsically interesting，it is not especially useful for 
computation because S n ， having order n!，is too large in comparison with n. 

The second operation we will consider is more subtle- It is conjugation ， the 
map G x G - >G, defined by 

(1.4) (g,x)^^gxg^\ 

For obvious reasons，we will not use multiplicative notation for this operation. You 
should verify the axioms (5.1) in Chapter 5, introducing a temporary notation such 
as g^x to denote the conjugate gxg^K 

The stabilizer of an element x G G for the operation of conjugation has a spe¬ 
cial name. It is called the centralizer of x and is denoted by Z(x): 

(1.5) Z(x) = {g EG \ gxg_ l = 4 = {^ G G I ^ = xg}. 

The centralizer is the set of group elements which commute with x. Note that 
jc E Z(x), because x commutes with itself. 

The orbit of x for the operation of conjugation is called the conjugacy class of 
x. It consists of all conjugate elements gxg~ 1 . We often write the conjugacy class as 

(1.6) C x = {x r G G \ x f = gxg~ l for some g G G}. 

By the Counting Formula [Chapter 5 (7,2 )]， |G| = | C x \ \Z(x)\. 

Since the conjugacy classes are orbits for a group operation, they partition G. 
This gives us what is called the Class Equation for a finite group [see Chapter 5(7.3)]: 

(1.7) \G\ = X |C|. 

conjugacy 
classes C 

If we number the conjugacy classes, say as C/, i = I ， …， k ， then this formula reads 

IGI = Ci + … + Ck . 

However there is some danger of confusion，because the subscript i in G is an index, 
while the notation C x as used above stands for the conjugacy class containing the el¬ 
ement x of G. In particular, C\ has two meanings. Perhaps it will be best to list the 
conjugacy class of the identity element 1 of G first. Then the two interpretations of 
Ci will agree. 

Notice that the identity element is left fixed by all g E G. Thus Ci consists of 
the element 1 alone. Note also that each term on the right side of (1.7), being the 
order of an orbit, divides the left side. This is a strong restriction on the combina¬ 
tions of integers which may occur in such an equation. 

(L8) The numbers on the right side of the Class Equation divide the 

order of the group ， and at least one of them is equal to 1 - 

For example，the conjugacy classes in the dihedral group D 3 , presented as in 
Chapter 5 (3.6)，are the following three subsets: 


{1} ， {x ， x 2 } ， {y,xy,x 2 y}. 
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The two rotations x 2 are conjugate, as are the three reflections. The Class Equa¬ 
tion for D 3 is 

(1.9) 6 = 1 + 2 + 3. 

Recall from Chapter 2 (4.10) that the center of a group G is the set Z of ele¬ 
ments which commute with all elements of the group: 

Z = {g B G \ gx - xg for all x E G}. 

Now the conjugacy class of an element x consists of that element alone if and only if 
x = gxg~ l for all g E G. This means that x is in the center. Thus the elements of 
the center are represented by 1 on the right side of the Class Equation. 

The next proposition follows directly from the definitions* 

(1.10) Proposition* An element x is in the center of a group G if and only if its 
centralizer Z(x) is the whole group. □ 

One case in which the Class Equation (1.7) can be used effectively is when the 
order of G is a positive power of a prime p. Such a group is called a p-group. Here 
are a few applications of the Class Equation to p -groups - 

(1.11) Proposition. The center of a p -group G has order > 1 * 

Proof. The left side of (1,7) is a power of p，say p' Also, every term on the 
right side is a power of p too，because it divides p e . We want to show that some 
group element x ♦ 1 is in the center，which is the same as saying that more than one 
term on the right side of (1.7) is equal to 1. Now the terms other than 1， being posi¬ 
tive powers of p, are divisible by p. Suppose that the class Ci made the only contri¬ 
bution of 1 to the right side. Then the equation would read 

= 1 + X {multiples of p ), 

which is impossible unless e — 0. q 

The argument usecl in this proof can be turned around and abstracted to give 
the following important Fixed Point Theorem for actions of p -groups; 

(1.12) Proposition. Let G be a p -group, and let 5 be a finite set on which G oper¬ 
ates. Assume that the order of S is not divisible by p. Then there is a fixed point for 
the action of G on S, that is, an element s B S whose stabilizer is the whole group. □ 

(M3) Proposition. Every group of order p 2 is abelian. 

Proof. Let G be a group of order p 2 . We will show that for every x E the 
centralizer Z (x) is the whole group. Proposition (MO) will then finish the proof. So 
let x E (7, If x is in the center Z, then Z(x) = G as claimed. If x 关 Z, then Z(x) is 
strictly larger than Z，because it contains Z and also contains the element x. Now the 
orders of Z and Z(x) divide \ G\ = p 2 , and Proposition (Ml) tells us that |Z| is at 
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least p. The only possibility is that | Z(x) \ = p 2 . Hence Z(x) = G, and jc was in the 
center after all. □ 

There are nonabelian groups of order p 3 . The dihedral group D 4 , for example ， 
has order 8 . 

Let us use (1J3) to classify groups of order p 2 . 

(1.14) Corollary• Every group of order p 2 is of one of the following types: 

(i) a cyclic group of order p 2 ; 

(ii) a product of two cyclic groups of order p. 

Proof. Since the order of an element divides p 2 , there are two cases to 
consider: 

Case 1: G contains an element of order p 2 and is therefore a cyclic group. 

Case 2: Every element x of G except the identity has order p. Let x,y be two ele¬ 
ments different from 1 ， and let Hi , H 2 be the cyclic groups of order p generated by x 
and y respectively. We may choose y so that it is not a power of x. Then since 
y ^ H\, Hi D H 2 is smaller than H 2j which has order p. So //1 Pi // 2 = {1}, Also, 
the subgroups Hi are normal because G is abelian. Since y ♦ H '， the group H\H 2 is 
strictly larger than H\ 5 and its order divides p 2 . Thus H\H 2 = G. By Chapter 2 
(8.6)，G - //1 x// 2 . □ 

The number of possibilities for groups of order p n increases rapidly with n. 
There are five isomorphism classes of groups of order 8 , and 14 classes of groups of 
order 16. 


2. THE CLASS EQUATION OF THE ICOSAHEDRAL GROUP 

In this section we determine the conjugacy classes in the icosahedral group I of rota¬ 
tional symmetries of a dodecahedron, and use them to study this very interesting 
group. As we have seen, the order of the icosahedral group is 60. It contains rota¬ 
tions by multiples of 2tt/5 about the centers of the faces of the dodecahedron, by 
multiples of 2tt/3 about the vertices，and by tt about the centers of the edges. Each 
of the 20 vertices has a stabilizer of order 3， and opposite vertices have the same 
stabilizer. Thus there are 10 subgroups of order 3 — the stabilizers of the vertices. 
Each subgroup of order 3 contains two elements of order 3, and the intersection of 
any two of these subgroups consists of the identity element alone. So I contains 
10 x 2 = 20 elements of order 3. Similarly, the faces have stabilizers of order 5, 
and there are six such stabilizers，giving us 6 x 4 = 24 elements of order 5. There 
are 15 stabilizers of edges, and these stabilizers have order 2. So there are 15 ele¬ 
ments of order 2. Finally, there is one element of order 1. Since 

(2,1) 60 = 1 + 15 + 20 + 24 ， 


we have listed all elements of the group. 
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Equation (2.1) is obtained by partitioning the group according to the orders of 
the elements. It is closely related to the Class Equation, but we can see that (2.1) is 
not the Class Equation itself, because 24, which appears on the right side, does not 
divide 60. On the other hand, we do know that conjugate elements have the same 
order. So the Class Equation is obtained by subdividing this partition of G still fur¬ 
ther. Also, note that the subgroups of order 3 are all conjugate. This is a general 
property of group operations，because they are the stabilizers of the vertices, which 
form a single orbit [Chapter 5 (6,5)]. The same is true for the subgroups of order 5 
and for those of order 2. 

Clearly the 15 elements of order 2, being the nontrivial elements in conjugate 
subgroups of order 2, form one conjugacy class. What about the elements of order 
3? Let x denote a counterclockwise rotation by 2tt/3 about a vertex v. Though x will 
be conjugate to rotation with the same angle about any other vertex [Chapter 5 
(6,5)]，it is not so clear whether or not x is conjugate to x 2 _ Perhaps the first guess 
would be that x and x 2 are not conjugate. 

Let v f denote the vertex opposite to v, and let x f be the counterclockwise rota¬ 
tion by 2tt/3 about v f . So x and are conjugate elements of the group. Notice that 
the counterclockwise rotation x about v is the same motion as the clockwise rotation 
by 27 t/ 3 about the opposite vertex v\ Thus x 2 = x\ and this shows that x and x 2 
are conjugate after all. It follows that all the elements of order 3 are conjugate. Sim¬ 
ilarly, the 12 rotations by 2tt/5 and -2tt/5 are conjugate. They are not conjugate to 
the remaining 12 rotations by 4 订 /5 ， — 4 订 /5 of order 5, (One reason, as we have al¬ 
ready remarked, is that the order of a conjugacy class divides the order of the group ， 
and 24 does not divide 60.) Thus there are two conjugacy classes of elements of or¬ 
der 5, and the Class Equation is 

(22) 60 = 1 + 15 + 20 + 12 + 12. 

We will now use this Class Equation to prove the following theorem, 

(2.3) Theorem. The icosahedral group I has no proper normal subgroup. 

A group G + {1} is called a simple group if it is not the trivial group and if it 
contains no proper normal subgroup (no normal subgroup other than {1} and G). 
Thus the theorem can be restated as follows: 

(2.4) The icosahedral group is a simple group• 

Cyclic groups of prime order contain no proper subgroup at all and are there¬ 
fore simple groups. All other groups, except for the trivial group, contain proper 
subgroups，though not necessarily normal ones. We should emphasize that this use 
of the word simple does not imply “uncomplicated” Its meaning here is roughly “not 
compound,” 

Proof of Theorem (23). The proof of the following lemma is straightforward: 
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(2.5) Lemma. 

(a) If a normal subgroup of a group G contains an element x, then it contains 
the conjugacy class C x of x in G. In other words，a normal subgroup is a union 
of conjugacy classes* 

(b) The order of a normal subgroup of G is the sum of the orders of the con¬ 
jugacy classes which it contains. □ 

We now apply this lemma. The order of a proper normal subgroup of the icosa- 
hedral group is a proper divisor of 60 and is also the sum of some of the terms on 
the right side of the Class Equation (2-2)，including the term 1. It happens that there 
is no such integer. This proves the theorem. □ 

(2.6) Theorem. The icosahedral group is isomorphic to the alternating group A 5 . 

Proof. To describe this isomorphism, we need to find a set S of five elements 
on which I operates. One such set consists of the five cubes which can be inscribed 
into a dodecahedron，one of which is illustrated below: 



(2*7) Figure. One of the cubes inscribed in a dodecahedron. 

The group I operates on this set of cubes 5, and this operation defines a homomor¬ 
phism <p: I - >S 5 , the associated permutation representation. The map (p is our iso¬ 

morphism from I to its image A 5 . To show that it is an isomorphism, we will use the 
fact that / is a simple group, but we need very little information about the operation 
itself. 

Since the kernel of <p is a normal subgroup of I and since / is a simple group, 
ker <p is either {1} or To say ker <p = I would mean that the operation of I on the 
set of five cubes was the trivial operation, which it is not. Therefore ker (p = {1}, 
and <p is injective, defining an isomorphism of I onto its image in S 5 . 

Let us denote the image in S 5 by I too. We restrict the sign homomorphism 

S 5 - >{±1} to /， obtaining a homomorphism I - >{±1}. If this homomorphism 

were surjective，its kernel would be a normal subgroup of / of order 30 [Chapter 2 
(6.15)]* This is impossible because / is simple. Therefore the restriction is the trivial 
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homomorphism, which just means that I is contained in the kernel A 5 of the sign ho¬ 
momorphism. Since both groups have order 60, 1 — A 5 . o 

3. OPERATIONS ON SUBSETS 

Whenever a group G operates on a set S, there is also an operation on subsets. If 
U C 5 is a subset, then 

(3.1) gU = {gu\u G U} 

is another subset of S. The axioms for an operation are clearly verified. So G oper¬ 
ates on the set of subsets of S. We can consider the operation on subsets of a given 
order if we want to do so_ Since multiplication by 茗 is a permutation of S, the sub¬ 
sets U and gU have the same order. 

For example, let O be the octahedral group of 24 rotations of a cube, and let S 
be the set of vertices of the cube. Consider the operation of O on subsets of order 2 
of S, that is, on unordered pairs of vertices• There are 28 such pairs, and they form 
three orbits for the group: 

(i) {pairs of vertices on an edge}; 

(ii) {pairs which are opposite on a face of the cube}; 

(iii) {pairs which are opposite on the cube}. 

These orbits have orders 12 ， 12, and 4 respectively; 28 = 12+12 + 4. 

The stabilizer of a subset U is the set of group elements g such that gU = U. 
Thus the stabilizer of a pair of opposite vertices on a face contains two elements —— 
the identity and the rotation by tt about the face. This agrees with the counting for¬ 
mula: 24 = 2 • 12, 

Note this important point once more: The equality gU = U does not mean that 
g leaves the elements in U fixed，but rather that g permutes the elements within U, 
that is, that gu ^ U whenever u E ： U. 

(3-2) Proposition. Let // be a group which operates on a set S, and let C/ be a sub¬ 
set of S. Then H stabilizes U if and only if t/ is a union of //-orbits. □ 

This proposition just restates the fact that the //-orbit of an element m E [/ is the set 
of all elements hu. If H stabilizes U, then U contains the //-orbit of any of its 
elements. □ 

Let’s consider the case that G operates by left multiplication on the subsets of 
G. Any subgroup // of G is a subset, and its orbit consists of the left cosets. This 
operation of G on cosets was defined in Chapter 5 (6.1). But any subset of G has an 
orbit* 

(3.3) Example. Let G = D 3 be the dihedral group of symmetries of an equilateral 
triangle, presented as usual: 

G = {xy I 0 < / < 2, 0 < j < 1, x 3 = 1 ? y 2 = 1 ? yx = x 2 y}. 
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This group contains 15 subsets of order 2, and we can decompose this set of 15 into 
orbits for left multiplication. There are three subgroups of order 2: 

(3.4) Hi = {l,y}, H 2 = H 3 = {l,x 2 y}. 

Their cosets form three orbits of order 3. The other six subsets of order 2 form a sin¬ 
gle orbit: 15 = 3 + 3 + 3 +6. The orbit of six is 

(3.5) {l ， x} ， {x ， x 2 } ， {x 2 ,1} ， {y,x 2 y}, {xy ， y} ， {x 2 y,xy}. n 

(3.6) Proposition. Let U be a subset of a group G. The order of the stabilizer 
Stab (U) of U for the operation of left multiplication divides the order of U. 

Proof• Let H denote the stabilizer of U. Proposition (3.2) tells us that U is a 
union of orbits for the operation of //on G. These //-orbits are right cosets Hg. So U 
is a union of right cosets • Hence the order of C/ is a multiple of | H | _ □ 

Of course since the stabilizer is a subgroup of G, its order also divides \ G\. So 
if I [/1 and IGI have no common factor，then Stab (U) is the trivial subgroup {1}. 

The operation by conjugation on subsets of G is also interesting. For example ， 
we can partition the 15 subsets of Ds of order 2 into orbits for conjugation. The set 
{H\, H 2 , Hs} of conjugate subgroups is one orbit，and the set {x,x 2 } forms an orbit 
by itself. The other orbits have orders 2 ? 3, and 6: 15 = 1 + 2 + 3 + 3 + 6. 

For our purposes，the important thing is the orbit under conjugation of a sub¬ 
group H 〔 G, This orbit is the set of conjugate subgroups 

\gGG}. 

The subgroup H is normal if and only if its orbit consists of H alone, that is, 
gHg~ 1 = H for all g E G. 

The stabilizer of a subgroup H for the operation of conjugation is called the 
normalizer of H and is denoted by 

(3.7) N(H) = {g EG\ gHg~ l = Hi 
The Counting Formula reads 

(3.8) IGI = I AT(H) I . I {conjugate subgroups} |• 

Hence the number of conjugate subgroups is equal to the index [G : (//)]. 

Note that the normalizer always contains the subgroup 

(3.9) N{H) D H, 

because hHh~ l = H when h E ： H. So by Lagrange’s Theorem, | H \ divides 
\N(H)l and \N(H)\ divides |G|. 

In example (3.3), the subgroups H U H 2 , H 3 are all conjugate, and so 
\N(Hi) \ = 2; hence N(Hi) = Hi. 

The definition of the normalizer (H) shows that // is a normal subgroup of 
N (//)，and in fact N(H) is the largest group containing // as a normal subgroup. In 
particular, N(H) = G if and only if // is a normal subgroup of G. 
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4. THE SYLOW THEOREMS 

The Sylow Theorems, which we will prove in this section，describe the subgroups of 
prime power order of an arbitrary finite group. 

Let G be a group of order n = \ G\, and letp be a prime number which divides 
n. We will use the following notation: p e will denote the largest power of p dividing 
«， so that 

(4*1) n = p € m 

for some integer m, and p does not divide m. 

( 4 . 2 ) Theorem. First Sylow Theorem: There is a subgroup of G whose order is p e . 
The proofs of the Sylow Theorems are at the end of the section. 

(4.3) Corollary* If a prime p divides the order of a finite group (7, then G con¬ 
tains an element of order p. 

For, let // be a subgroup of order p e ， and let x be an element of H different from 1 • 
The order of x divides p' so it is p r for some r in the range 0 < r < e. Then x pT 
has order p. □ 

Without the Sylow Theorem, this corollary is not obvious. We already know 
that the order of any element divides |G|, but we might imagine a group of order 6 , 
for example，made up of the identity 1 and five elements of order 2 . No such group 
exists. According to (4.3)，a group of order 6 must contain an element of order 3 
and an element of order 2 . 

(4.4) Corollary. There are exactly two isomorphism classes of groups of order 6 . 
They are the classes of the cyclic group Ce and of the dihedral group D 3 . 

Proof. Let x be an element of order 3 and y an element of order 2 in G. It is 
easily seen that the six products x l y j ? 0 < / < 2 , 0 < 7 < 1 are distinct elements 
of the group. For we can rewrite an equation x l y j = x r y s in the form x l ~ r = 

Every power of x except the identity has order 3, and every power of y except the 
identity has order 2 . Thus x l ~ r = y s ^ J = 1, which shows that r = i and 5 = j t 
Since G has order 6 ， the six elements l ， x ， x 2 ， y ， jcy ， x 2 y run through the whole 
group. In particular, yx must be one of them. It is not possible that yx = y because 
this would imply x = 1. Similarly, yx * 1 ， x ， x 2 . Therefore one of the two relations 

yx = xy or yx — x 2 y 

holds in G. Either of these relations, together with x 3 二 1 and y 2 = 1， allows us to 
determine the multiplication table for the group. Therefore there are at most two iso¬ 
morphism classes of groups of order 6 , We know two already, namely the classes of 
the cyclic group Ce and of the dihedral group Dz. So they are the only ones. □ 
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(4.5) Definition. Let G be a group of order n = p e m ，where is a prime not di¬ 
viding m and e > The subgroups // of G of order p e are called Syiow p- 
subgroups of G ? or often just Sylow subgroups• 

Thus a Sylow /?-subgroup is a /?-subgroup whose index in the group is not di¬ 
visible by p. By Theorem (4-2)，a finite group G always has a Sylow /?-subgroup if p 
divides the order of G. The remaining Sylow Theorems (4.6) and (4.8) give more 
information about them. 

(4.6) Theorem. Second Sylow Theorem: Let 尺 be a subgroup of G whose order is 
divisible by p, and let 孖 be a Sylow /?-subgroup of G. There is a conjugate subgroup 
H r = gHg_' such that 尺 fl 丑 'is a Sylow subgroup of K. 

(4.7) Corollary. 

(a) If K is any subgroup of G which is a 尸 -group，then K is contained in a Sylow 

subgroup of G. 

(b) The Sylow /?-subgroups of G are all conjugate. 

It is clear that a conjugate of a Sylow subgroup is also a Sylow subgroup. So to ob¬ 
tain the first part of the corollary, we only need to note that the Sylow subgroup of a 
/?*group K is the group K itself. So if // is a Sylow subgroup and AT is a 尸 -group ， 
there is a conjugate H f such that K H H r = K, which is to say that H f contains K. 
For part (b) ? let K and H be Sylow subgroups. Then there is a conjugate H f of H 
which contains K. Since their orders are equal, K = H\ Thus K and H are conju¬ 
gate, □ 

(4.8) Theorem. Third Sylow Theorem: Let \ G\ = n, and n = p e m as in (4.1). 
Let 5 be the number of Sylow /?-subgroups. Then s divides m and is congruent 1 
(modulo p): s\m y and s = ap + 1 for some integer a > 0, 

Before proving these theorems，we will use them to determine the groups of 
orders 15 and 2 L These examples show how powerful the Sylow Theorems are，but 
do not be misled. The classification of groups of order n is not easy when n has many 
factors. There are just too many possibilities. 

(4.9) Proposition. 

(a) Every group of order 15 is cyclic. 

(b) There are two isomorphism classes of groups of order 21: the class of the 
cyclic group C 2 \ and the class of the group G having two generators x, y which 
satisfy the relations x 1 = 1, y 3 = \,yx = x 2 y. 

Proof • 

(a) Let G be a group of order 15. By (4.8) the number of its Sylow 3-subgroups di¬ 
vides 5 and is congruent 1 (modulo 3), The only such integer is 1. Therefore there is 
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one Sylow 3-subgroup H, and so it is a normal subgroup. There is one Sylow 5-sub¬ 
group K y and it is normal too, for similar reasons. Clearly, K D H = {1}，because 
the order ofKDH divides both 5 and 3, Also, KH is a subgroup of order >5, and 
hence KH = G. By (8,6) in Chapter 2, G is isomorphic to the product group H x K. 
Thus every group of order 15 is isomorphic to a direct product of cyclic groups of 
orders 3 and 5. All groups of order 15 are isomorphic. Since the cyclic group Ci 5 is 
one of them，every group of order 15 is cyclic. 

(b) Let G be a group of order 21. Then Theorem (4.8) shows that the Sylow 7-sub¬ 
group K must be normal. But the possibility that there are seven conjugate Sylow 
3-subgroups H is not ruled out by the theorem, and in fact this case does arise. Let jc 
denote a generator for K, and ^ a generator for one of the Sylow 3-subgroups H. 
Then x 1 = 1 ， y 3 = 1 ， and，since K is normal ， yxy^ 1 — x l for some i < 7. 

We can restrict the possible exponents i by using the relation y 3 = 1. It implies 
that 

3—3 7 i -2 -1 ；3 

jc = y^xy = y x l y ~ yx l y 1 = jc . 

Hence i 3 = 1 (mod 7). This means that i can take the values 1 ，2 , 4. 

Case 1: yxy" x = x. The group is abelian, and by (8.6) in Chapter 2 it is isomorphic 
to a direct product of cyclic groups of orders 3 and 7. Such a group is cyclic [Chap¬ 
ter 2 (8.4)]. 

Case 2; yxy~ x = x 2 . The multiplication in G can be carried out using the rules 

x 1 = 1, y 3 — l, yx = x 2 y, to reduce every product of the elements x,y to one of 

* * 

the forms x l y J with 0 ^ / < 7 and 0 ^ j < 3. We leave the proof that this group 
actually exists as an exercise. 

Case 3: yxy~ l — jc 4 * In this case, we replace y by y 2 , which is also a generator for 
H，to reduce to the previous case: y 2 xy~ 2 = yx 4 y~ { = jc 16 = x 2 . Thus there are two 
isomorphism classes of groups of order 21， as claimed- □ 

We will now prove the Sylow Theorems, 

Proof of the First Sylow Theorem. We let ^ be the set of all subsets of G of 
order p e . One of these subsets is the subgroup we are looking for，but instead of 
finding it directly we will show that one of these subsets has a stabilizer of order p e . 
The stabilizer will be the required subgroup. 

(4 JO) Lemma. The number of subsets of order p e in a set ofn = p e m elements 
(p not dividing m ) is the binomial coefficient 

w \ _ n (n — 1) ••• (rt _ 灸 ）… （w _ 〆 + 1) 

p e ) p e {p e — l)*-(p e — 々 )_•• 1 



Moreover is not divisible by p. 
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Proof. It is a standard fact that the number of subsets of order p e is this bino¬ 
mial coefficient. To see that iv is not divisible by p, note that every time p divides a 

term (n — k) in the numerator of n, it also divides the term (p € — k) of the denomi- 

■ 

nator exactly the same number of times: If we write k in the form k = p l l ， where p 
does not divide /， then i < e. Therefore (n — k) and (p e — k) are both divisible by 
p l but not divisible by p l+ \ □ 

We decompose ^ into orbits for the operation of left multiplication, obtaining 
the formula 

^ = 1^1 = E 1^1- 

orbits O 

Since p does not divide n, some orbit has an order which is not divisible by p, say 
the orbit of the subset U. We now apply Proposition (3*6) to conclude that | Stab (U) \ 
is a power of p. Since 

(4.11) I Stab (t/) I . \Ou \ = IGI = p e m 

by the Counting Formula, and since \O v \ is not divisible by p, it follows that 
Stab (U) I = p e . This stabilizer is the required subgroup •口 

Proof of the Second Sylow Theorem. We are given a subgroup K and a Sylow 
subgroup H of G, and we are to show that for some conjugate subgroup H f of//, the 
intersection K H //' is a Sylow subgroup of K. 

Let S denote the set of left cosets G/H. The facts that we need about this set 
are that G operates transitively, that is，the set forms a single orbit，and that H is the 
stabilizer of one of its elements, namely of 5 = \H. So the stabilizer of as is the 
conjugate subgroup aHa 1 [see Chapter 5(6,5b)]_ 

We restrict the operation of G to and decompose S into /C-or bits. Since H is 
a Sylow subgroup，the order of S is prime to p. So there is some orbit 0 whose 
order is prime to p. Say that 0 is the orbit of the element as. Let H f denote the 
stabilizer aHa 1 of as for the operation of G. Then the stabilizer of as for the re' 
stricted operation of K is obviously H f D K ，and the index 尺 ] is |C >|， 

which is prime to p. Also, since it is a conjugate of H, H f is a p -group. Therefore 
H f D K is sl /?-group. It follows that H f H K is sl Sylow subgroup of K. □ 

Proof of the Third Sylow Theorem. By Corollary (4.7)，the Sylow subgroups of 
G are all conjugate to a given one, say to //. So the number of Sylow subgroups is 
s = [G:N], where N is the normalizer of H. Since H N, [G:A^] divides 
[G:H] = m. To show s = 1 (modulo p), we decompose the set {// 1 ， … ， H s } of Sy* 
low subgroups into orbits for the operation of conjugation by H = H { . An orbit con¬ 
sists of a single group Hi if and only if H is contained in the normalizer M of Hi . If 
so, then H and Hi are both Sylow subgroups of N“ and Hi is normal in Corollary 
(4.7b) shows that H = Hi. Therefore there is only one //-orbit of order 1， namely 
{H}. The other orbits have orders divisible by p because their orders divide \H\, by 
the Counting Formula. This shows that 5 = 1 (modulo p). □ 
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5. THE GROUPS OF ORDER 12 

In this section，we use the Sylow Theorems to classify the groups of order 12: 

(5.1) Theorem, There are five isomorphism classes of groups of order 12. They 
are represented by: 

(i) the product of cyclic groups C 3 x C 4 ; 

(ii) the product of cyclic groups C 2 x C 2 x C 3 ； 

(iii) the alternating group A 4 ， 

(iv) the dihedral group D 6 , 

(v) the group generated by with relations x 4 = l, y 3 = 1, xy = y 2 x. 

Note that C 3 x C 4 is isomorphic to Cn and that C 2 x C 2 x C 3 is isomorphic to 
Ci x C 6 (see [Chapter 2 (8.4)]). 

Proof. Let G be a group of order 12. Denote by // a Sylow 2-subgroup of G ， 
which has order 4, and by 尺 a Sylow 3-subgroup，of order 3, It follows from Theo¬ 
rem (4,8) that the number of Sylow 2-subgroups is either 1 or 3, and that the number 

of Sylow 3-subgroups is 1 or 4. Also, // is a group of order 4 and is therefore either 

a cyclic group or the Klein four group V, a product of two cyclic groups of order 2: 

(5.2) H ^ C 4 or H ^ V. 

(5.3) Lemma, At least one of the two subgroups H,K is normal. 

Proof• Suppose that K is not normal. Then K has four conjugate subgroups 
K = Ku …， Since | 兄 | = 3, the intersection of any two of these groups must be 
the identity. Counting elements shows that there are only three elements of G which 
are not in any of the groups 



Any Sylow 2-subgroup H has order 4， and H 0 Ki = {1}. Therefore it consists of 
these three elements and 1. This describes H for us and shows that there is only one 
Sylow 2-subgroup- Thus H is normal. □ 
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Since H H K = {1}，every element of HK has a unique expression as a 
product hk [Chapter 2 ( 8 . 6 )] ? and since |G| = 12, HK = G. If// is normal, then K 
operates on H by conjugation, and we will show that this operation, together with 
the structure of H and K, determines the structure of G. Similarly, if K is normal 
then H operates on K, and this operation determines G, 

Case 1: H and K are both normal• Then by ( 8 , 6 ) in Chapter 2 ? G is isomorphic to 
the product group H x K. By (5.2) there are two possibilities: 

(5.4) G ~ C 4 x C 3 or G ~ V ^ C 3 - 
These are the abelian groups of order 12. 

Case 2: H is normal but K is not. So there are four conjugate Sylow 3-subgroups 
{ 尺 1 ” ， . ，尺 4 }，and G operates by conjugation on this set S of four subgroups. This op¬ 
eration determines a permutation representation 

(5.5) G-^5 4 . 

Let us show that <p maps G isomorphically to the alternating group A 4 in this case. 

The stabilizer of Kt for the operation of conjugation is the normalizer N (Ki), 
which contains Ki, The Counting Formula shows that | N(Ki) | = 3, and hence that 
N (Ki) = Ki - Since the only element common to the subgroups Ki is the identity ele¬ 
ment, only the identity stabilizes all of these subgroups. Thus (p is injective and G is 
isomorphic to its image in 

Since G has four subgroups of order 3， it contains eight elements of order 3, 
and these elements certainly generate the group. If x has order 3, then (p (x) is a per¬ 
mutation of order 3 in 5 4 . The permutations of order 3 are even* Therefore 
im<p C A 4 . Since |G| = | A 4 1, the two groups are equal. 

As a corollary，we note that if H is normal and K is not，then H is the Klein 
four group V, because the Sylow 2-subgroup of A 4 is V, 

Case 3: K is normal, but H is not. In this case H operates on K by conjugation, and 
conjugation by an element of H is an automorphism of K. We let y be a generator for 
the cyclic group K: y 3 = 1, There are only two automorphisms of K — the identity 
and the automorphism which interchanges y and y 2 . 

Suppose that H is cyclic of order 4, and let x generate H\ x A = L Then since G 
is not abelian, xy ^ yx, and so conjugation by jc is not the trivial automorphism of 
K. Hence xyx~ l = y 2 . The Todd-Coxeter Algorithm (see Section 9) is one way to 
show that these relations define a group of order 12 : 

(5.6) jc 4 = 1 ? y 3 = 1 3 xyx~ l = y 2 . 

The last possibility is that H is isomorphic to the Klein four group. Since there 
are only two automorphisms of K, there is an element w E H besides the identity 
which operates trivially: wyw ^ 1 = y. Since G is not abelian，there is also an element 
v which operates nontrivially: vyv 1 = y 2 . Then the elements of H are {l,v,w,vw}, 
and the relations v 2 = w 2 = 1 , and vw = wv hold in H. The element x = wy has 
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order 6, and vxv~ l = vwyv~ l = wy 2 = y 2 w = x~K The relations x 6 = 1 ， u 2 = 1 ， 
vxv~ l = jc— 1 define the group D 6 , so G is dihedral in this case. □ 


6. COMPUTATION IN THE SYMMETRIC GROUP 

We want to bring up two points about calculation with permutations. The first con¬ 
cerns the order of multiplication. To have a uniform convention, we have used the 
functional notation p(x) for all our maps p, including permutations. This has the 
consequence that a product pq must be interpreted as the composed operation p 0 q, 
that is，“first apply q, then p •” When multiplying permutations, it is more usual to 
read pq as “first apply /?, then We will use this second convention here. A com¬ 
patible notation for the operation of a permutation p on an index i requires writing 
the permutation on the right side of the index: 

⑴尸. 

Applying first p and then q to an index i, we get ((i)p)q = (i)pg, as desired. Actu¬ 
ally, this notation looks funny to me. We will usually drop the parentheses: 


⑴ /? = ip. 

What is important is that p must appear on the right. 

To make our convention for multiplication compatible with matrix multiplica¬ 
tion, we must replace the matrix P associated to a permutation p in Chapter 1 (4*6) 
by its transpose P\ and use it to multiply on the right on a row vector. 

The second point is that it is not convenient to compute with permutation ma¬ 
trices, because the matrices are large in relation to the amount of information they 
contain. A better notation is needed. One way to describe a permutation is by means 
of a table* We can consider the configuration 


( 6 . 1 ) 


1 2 3 4 5 6 7 8 

4 6 8 3 5 2 1 7 


as a notation for the permutation defined by 

1 /? = 4 ， = 6 ”，* • 

It is easy to compute products using this notation. If for example 


= 12345678 

9=24681357 


then we can evaluate pq (first p, then q) by reading the two tables in succession: 

= [1234567 8" 
W= 83761425_ 


Table (6.1) still requires a lot of writing，and of course the top row is always 
the same. It could，in principle, be left off, to reduce the amount of writing by half ， 



212 


More Group Theory Chapter 6 


but this would make it hard to find our place in the bottom row if we were permut¬ 
ing, say, 18 digits* 

Another notation, called cycle notation, is commonly used. It describes a per¬ 
mutation of n elements by at most n symbols and is based on the partition of the in¬ 
dices into orbits for the operation of a permutation. Let p be a permutation，and let 
H be the cyclic subgroup generated by p. We decompose the set {1 ， … ， n} into H- 
orbits and refer to these orbits as the p-orbits. The /?-orbits form a partition of the 
set of indices，called the cycle decomposition associated to the permutation p. 

If an index i is in an orbit of k elements, the elements of the orbit will be 

Let us denote \p r by i r , so that 0 = {i 0 , ii ，…， it-i}. Then p operates on this orbit as 


( 6 . 2 ) 



m 



12 



A permutation which operates in this way on a subset {i 0 ， ii ， .*. ， U-i}of the indices 
and leaves the remaining indices fixed is called a cyclic permutation. Thus 


( 6 . 3 ) 






3 




7 —- 


8 


defines a cyclic permutation of order 5 of {1 ， …， 8 }， it being understood that the in¬ 
dices 2,5, 6 which are not mentioned are left fixed — each forms a <r-orbit of one el¬ 
ement. When we speak of the indices on which a permutation operates ，we will 
mean the ones which are not fixed: 1 ， M ， 7,8 in this case. 

Another cyclic permutation of {l ， ... ， 8} is 


(6.4) 



Such a cyclic permutation of order 2 is called a transposition. A transposition is a 
permutation which operates on two indices. 

Our permutation p (6.1) is not cyclic because there are three p-orbits: 



\ 8 

乂 7〆 


cr 

It is clear that 



p = ar = 丁 <t ， 



where ar denotes the product permutation. 
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(6.5) Proposition « Let a ? r be permutations which operate on disjoint sets of in¬ 
dices .Then = t<j. 

Proof. If neither a nor r operates on an index i ， then iar = ira = i. If 
sends i to j 妾 i ， then r fixes both i and j. In that case ，\ar = jr = j and 
Ira = i<r = j too. The case that r operates on i is the same. □ 

Note, however, that when we multiply permutations which operate on overlap¬ 
ping sets of indices，the operations need not commute. The symmetric group S n is 
not a commutative group if w > 2. For example, if t' is the transposition which in¬ 
terchanges 3 and 6 and if a is as above，then crr f ^ r f a. 

(6.6) Proposition. Every permutation p not the identity is a product of cyclic per¬ 
mutations which operate on disjoint sets of indices; p = o^cmic ，and these cyclic 
permutations a r are uniquely determined by p. 

Proof. We know that p operates as a cyclic permutation when restricted to a 
single orbit. For each /?-orbit，we may define a cyclic permutation a r which permutes 
that orbit in the same way that p does and which fixes the other indices. Clearly, p is 
the product of these cyclic permutations. Conversely, let p be written as a product 
cr! cr 2 •" cr^t of cyclic permutations operating on distinct sets Oi， …， Ok of indices. Ac¬ 
cording to Proposition (6.5)，the order does not matter. Note that <r 2 , …，的 fix the 
elements of Oi ； hence p and act in the same way on 0 \, Therefore Oi is a/?-orbit. 
The same is true for the other cyclic permutations. Thus Ou …， Ok are the p-orbits 
which contain more than one element，and the permutations at are those constructed 
at the start of the proof. □ 

A cycle notation for the cyclic permutation (6.2) is 

(6.7) (U-U-i). 

Thus our particular permutation a has the cycle notation (14387). The notation is 
not completely determined by the permutation, because we can start the list with 
any of the indices i 0 , …， h There are five equivalent notations for cr ： 

cr — (43871) = (38714) = ."• 

Any one of these notations may be used. 

A cycle notation for an arbitrary permutation p is obtained by writing the per¬ 
mutation as a product of cyclic permutations which operate on disjoint indices, and 
then writing the cycle notations for each of these permutations in succession. The or¬ 
der is irrelevant. Thus two of the possible cycle notations for the permutation p de¬ 
scribed above are 

(14387)(26) and (62)(87143). 

If we wish，we can include the “one-cycle” （ 5)，to represent the fixed element 5, 
thereby presenting all the indices in the list. But this is not customary* 
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With this notation，every permutation can be denoted by a string of at most n 
integers ? suitably bracketed. Products can still be described by juxtaposition. A cy¬ 
cle notation for the permutation q considered above is q = (12487 5)(36). Thus 

a r cr f r f 

(6.8) pq = (14387)(26)(124875)(36) = crrcr V , . 

This string of cycles represents the permutation pq. To evaluate the product on an in¬ 
dex, the index is followed through the four factors: 

1 4 AAAAA> 4 ^VWVA> ^ A/W\A> 谷， 3.11(1 SO Oil. 

However, (6.8) does not exhibit the decomposition of pq into disjoint cycles, be¬ 
cause indices appear more than once. Computation of the permutation as above leads 
to the cycle decomposition 

pq = (185)(237)(46) = 1 

V-s 




When the computation is finished，every index occurs at most once. 
For another sample, let p = (548)‘ Then 


(6.9) 


ap = (14387)(548) = (187)(354) 
pa = (548)(14387) = (147)(385). 


Now let us compute the conjugate of a permutation p. Since is a product of 
disjoint cycles，it will be enough to describe the conjugate q l <jq of a cyclic permu¬ 
tation cr, say the permutation (ii ••• U). (The fact that we have switched the order of 
multiplication makes the expression for conjugation by q~ l a little nicer than that for 
conjugation by q.) 


(6.10) Proposition. 


(a) Let a denote the cyclic permutation (iih .•]()，and let q be any permutation. 
Denote the index \ r q by j r . Then the conjugate permutation q" l crq is the cyclic 
permutation (jijz … jit)- 

(b) If an arbitrary permutation p is written as a product of disjoint cycles cr, then 
q~ x pq is the product of the disjoint cycles q x aq^ 

(c) Two permutations p ， p’ are conjugate elements of the symmetric group if and 
only if their cycle decompositions have the same orders. 

Proof • The proof of (a) is the following computation: 

} r q x (rq - \ r (rq = ir+4 = jr+i ， 

in which the indices are to be read modulo k. Part (b) follows easily. Also, the feet 
that conjugate permutations have cycle decompositions with the same orders follows 
from (b). Conversely，suppose that p and p f have cycle decompositions of the same 
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orders，Say that p = (ii … i r )(ii’ … i/) and p f = (ji "• … j/) …. Define 

q to be the permutation sending and so on. Then 

p f = q~ x pq^ □ 

Let us determine the Class Equation for the symmetric group 5 4 as an example. 
This group contains six transpositions 

(12)， (13), (14)， （23)， （24)， (34), 
three products of disjoint transpositions 

(12)(34)，(13)(24), (14)(23), 

eight 3-cycles, and six 4-cycles. By Proposition (6.10), each of these sets forms one 
conjugacy class. So the Class Equation of S 4 is 

24=1 + 3 + 6 + 6 + 8. 

We will now describe the subgroups G of the symmetric group S p whose order 
is divisible by p and whose Sylow p-subgroup is normal. We assume that p is a 
prime integer. Since p divides p\ = \S P \ only once, it also divides | G \ once，and so 
the Sylow /?-subgroup of G is a cyclic group. 

It turns out that such subgroups have a very nice description in terms of the 
finite field F p . To obtain it, we use the elements {0, 1 ， •" ， p-1} of the finite field as 
the indices. Certain permutations of this set are given by the field operations them¬ 
selves .Namely，we have the operations (add a) and (multiply by c) for any given 
a,c E c ^ 0. They are invertible operations and hence permutations of ¥ pj so 
they represent elements of the symmetric group. For example, (add 1) is the /7-cycle 

(6.11) (add 1) = (012 … （ p-1)). 

The operator (multiply by c) always fixes the index 0, but its cycle decomposition de¬ 
pends on the order of the element c in ¥ p x . For example, 

(6.12) (multiply by 2) = (1243) if p = 5 

= (124)(365) ifp - 7. 

Combining the operations of addition and multiplication gives us all operators on ¥ p 
of the form 

(6.13) x^^cx + a. 

The set of these operators forms a subgroup G of order p (p-1) of the symmetric 
group. 

The group of operators (6.13) has a nice matrix representation, as the set of 
2X2 matrices with entries in the field IF P ， of the form 

1 a\ 


(6.14) 


c 
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This matrix operates by right multiplication on the vector (l ? x) ? sending it to 
(1 ， c;c + a). So we can recover the operation of G on ¥ p from right multiplication by 
the corresponding matrix. (We use right multiplication because of our chosen order 
of operations.) The operations (add a) and (multiply by c) are represented by the ele¬ 
mentary matrices 

. 

1 a 1 

1 5 c • 


(6.15) Theorem. Let be a prime, and let // be a subgroup of the symmetric 
group S p whose order is divisible by p- If the Sylow /7-subgroup of H is normal, 
then, with suitable labeling of the indices, H is contained in the group of operators 
of the form (6.13). 

For example，the dihedral group D p operates faithfully on the vertices of a reg¬ 
ular 尸 -gon，and so it is realized as a subgroup of the symmetric group S p , It is the 
subgroup of (6.14) consisting of the matrices in which c = 土 1. 

Proof of the theorem. The only elements of order p of S p are the /7-cycles. So 
H contains a p-cycle，say a. We may relabel indices so that a becomes the standard 
/ 7 -cycle (add 1) = (01“_(p—l))* Then this permutation generates the Sylow 
/^-subgroup of H. 

Let Ti be another element of H. We have to show that ri corresponds to an op¬ 
erator of the form (6.13). Say that ri sends the index 0 to i. Since cr l also sends 0 to 
i, the product r = cr~ l T\ fixes 0. It suffices to show that t has the form (6,13), and 
to do so, we will show that r is one of the operators (multiply by c). 

By hypothesis, K = is a normal subgroup of H. Therefore 

(6.16) T— l crr = cr k 

for some k between 1 and p-1. We now determine r by computing both sides of this 
equation. By Proposition (6_10)，the left side is the p-cycle t 一 1 o ■丁 = 
(Or 1 t __, (p-l)T )， while direct computation of the right side gives a k = 
(Ok2k... (p-l)k): 

(Or lr... (p-l)r) = (0 k2k... (p-l)k). 

We must be careful in interpreting the equality of these two cycles，because the cycle 
notation is not unique. We need to know that the first index on the left is the same as 
the first index on the right. Otherwise we will have to identify equal indices in the 
two cycles and begin with them. That is why we normalized at the start，to have 
Or = 0. Knowing that fact，the two lists are the same, and we conclude that 

lr = k ， 2r = 2k, …. 

This is the operator {multiply by k), as claimed. □ 


We now return for a moment to the question of order of operations. If we wish 
to use the notation p(i) for permutations in this section, as we do for functions else- 
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where，we must modify our way of computing with cycles in order to take this into 
account. The most systematic way to proceed is to read everything ，including cycles ， 
from right to left. In other words, we should read the cycle (14387) as 

1 - «WVW ^ ^VVVW ^^WWA/ ^ <WVW y J 

This is the inverse of the permutation (6.3). We can then interpret the product 
(14387)(548) as composition: “First apply (548)，then (14387).” Computation 
of this product gives 

1 8 7 1 ， 3 -<WVW 5 ^WVW 4 

which we would write as (18 7)(35 4). Notice that this is the same string of symbols 
as we obtained in (6.9), Miraculously, reading everything backward gives the same 
answer when we multiply permutations. But of course, the notation (187)(354) now 
stands for the inverse of the permutation (6.9). The fact that the notations multiply 
consistently in our two ways of reading permutations mitigates the crime we have 
committed in switching from left to right. 


7. THE FREE GROUP 

We have seen a few groups，such as the symmetric group S 3 , the dihedral groups D n ， 
and the group M of rigid motions of the plane, in which one can compute easily us¬ 
ing a list of generators and a list of relations for manipulating them. The rest of this 
chapter is devoted to the formal background for such methods. In this section, we 
consider groups which have a set of generators satisfying no relations other than ones 
[such as x (yz) = (xy)z] which are implied by the group axioms. A set 5 of elements 
of a group which satisfy no relations except those implied by the axioms is called 

free ， and a group which has a free set of generators is called a free group • We will 
now describe the free groups. 

We start with an arbitrary set S of symbols，say 5 = {« ，々 ，<:，•••}，which may be 
finite or infinite，and define a word to be a finite string of symbols from S, in which 
repetition is allowed• For instance a ， aa,ba, and aaba are words. Two words can be 
composed by juxtaposition: 

aa, ba aaba\ 

in this way the set W of all words has an associative law of composition. Moreover, 
the “empty word” can be introduced as an identity element for this law. We will 
need a symbol to denote the empty word; let us use 1. The set W is called the free 
semigroup on the set of symbols S. Unfortunately it is not a group because inverses 
are lacking，and the introduction of inverses complicates things. 

Let S f be the set consisting of the symbols in S and also of symbols a~ l for 
every a E 5: 

(7.1) S f = {a,a^\b,b~\c,c~\...}. 


Let W r be the set of words made using the symbols If a word w EW f looks 
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like 


… XJC 


or … jc 一 1 jc … 


for some x E S, then we can agree to cancel the two symbols x,x~ l and reduce the 
length of the word. The word will be called reduced if no such cancellation can be 
made. Starting with any word w, we can perform a finite sequence of cancellations 
and must eventually get a reduced word , possibly the empty word 1. We call this 
word wo a reduced form of w. 

Now there is often more than one way to proceed with cancellation. For in¬ 
stance ? starting with w = babb l a~ x c~ l ca, we can proceed in several ways, such as 

ba 料 1 a] c~ l ca babb— 1 aHa 

■一 — ■ ■ 


b^~ x c~ x ca 




ba 


babb~ l 4 { 4 




ba 韩 


ba. 


The same reduced word is obtained at the end, though the letters come from differ¬ 
ent places in the original word. (The letters which remain at the end have been un¬ 
derlined.) This is the general situation. 


(7-2) Proposition. There is only one reduced form of a given word w . 

Proof. We use induction on the length of w. If w is reduced, there is nothing to 
show. If not, there must be some pair of letters which can be cancelled, say the un¬ 
derlined pair 

w = xx ' 1 … • 

(Let us allow x to denote any element of S f , with the obvious convention that if 
x = a~ ] then x~ ] = a.) If we show that we can obtain every reduced form wo of w 
by cancelling the pair xx ~ 1 first, then the proposition will follow by induction on the 
shorter word … 达 ' … thus obtained. 

Let wo be a reduced form of w. We know that w 0 is obtained from w by some 
sequence of cancellations. The first case is that our pair 狂 1 is cancelled at some step 
in this sequence. Then we might as well rearrange the operations and cancel xx~ l 
first. So this case is settled. On the other hand, the pair xx ~ 1 can not remain in vv 0 , 
since wo is reduced. Therefore at least one of the two symbols must be cancelled at 
some time. If the pair itself is not cancelled，then the first cancellation involving the 
pair must look like 

…〆 ― 1 左一 1 … or … 生\未 

Notice that the word obtained by this cancellation is the same as that obtained by 
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cancelling the original pair xx' 1 . So we may cancel the original pair at this stage in¬ 
stead. Then we are back in the first case, and the proposition is proved. □ 

Now we call two words w ? w r in W f equivalent ， and we write w~w f , if they 
have the same reduced form. This is an equivalence relation. 

(7.3) Proposition. The product of equivalent words is equivalent: If w~w f and 
\ then wv—w f v f . 

Proof. To obtain the reduced word equivalent to the product wv, we can first 
cancel as much as possible in w and in u，to reduce w to w 0 and v to v 0 . Then wv is 
reduced to w 0 vo. Now we continue cancelling in w 0 v 0 if possible. Since w f ~w and 
v the same process，applied to w f v\ passes through w 0 v 0 too，and hence it 
leads to the same reduced word. □ 

It follows from this proposition that equivalence classes of words may be mul¬ 
tiplied ? that is ， that there is a well-defined law of composition on the set of equiva¬ 
lence classes of words. 

(7.4) Proposition. Let F denote the set of equivalence classes of words 
Then F is a group with the law of composition induced from W f . 

Proof• The facts that multiplication is associative and that the class of the 
empty word 1 is an identity follow from the corresponding facts inW f At remains to 
check that all elements of F are invertible. But clearly, if w = jcy z then the class 
of z— 1 … y~ l x~ { is the inverse of the class of w. □ 

(7.5) Definition. The group F of equivalence classes of words is called the free 
group on the set 5, 

So an element of the free group F corresponds to exactly one reduced word in 
W f ,by Proposition (7.2). To multiply reduced words，combine and cancel: 

{abc^ l ){cb) /WV\A> abc l cb = abb. 

One can also introduce power notation for reduced words: aaab~ l b { = a 3 b~ 2 . 

The free group on the set S = {a} consisting of one element is the same as the 
set of all powers of a: F = {a n }. It is an infinite cyclic group. In contrast, the free 
group on a set *S = {a, b} of two elements is very complicated, 

8. GENERATORS AJSD REIATIONS 

Having described free groups，we now consider the more likely case that a set of 
generators of a group is not free 一 that there are some nontrivial relations among 
them. Our discussion is based on the mapping properties of the free group and of 
quotient groups. 
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(8,1) Proposition* Mapping property of the free group: Let F be the free group on 

a set 5 = {« ， 办 ，. •.}，and let G be a group. Every map of sets/: S - >G extends in a 

unique way to a group homomorphism <p: F - > G. If we denote the image/(x) of 

an element x E Shy x, then <p sends a word in S f = {a y a~\b, to the corre- 

sponding product of the elements in G. 

Proof• This rule does define a map on the set of words in S f . We must show 
that equivalent words are sent to the same product in G. But since cancellation in a 
word will not change the corresponding product in G 7 this is clear. Also, since mul¬ 
tiplication in F is defined by juxtaposition, the map <p thus defined is a homomor¬ 
phism. It is the only way to extend / to a homomorphism. □ 

If S is any subset of a group G ? the mapping property defines a homomorphism 

<p\ F - > G from the free group on 5 to G. This reflects the fact that the elements of 

S satisfy no relations in F except those implied by the group axioms，and explains 
the reason for the adjective free. 

A family S of elements is said to generate a group G if the map <p from the free 
group on 5 to G is surjective. This is the same as saying that every element of G is a 
product of some string of elements of S’，so it agrees with the terminology intro¬ 
duced in Section 2 of Chapter 2. In any case，whether or not S generates (7, the im¬ 
age of the homomorphism <p of Proposition (8.1) is a subgroup called the subgroup 
generated by S. This subgroup consists precisely of all products of elements of S f . 

Assume that S generates G. The elements of S are then called generators• 
Since is a surjective homomorphism, the First Isomorphism Theorem [Chapter 2 
(10.9)] tells us that G is isomorphic to the quotient group F/N, where N = ker 
The elements of N are called relations among the generators. They are equivalence 
classes of words w with the property that the corresponding product in G is 1: 

<p(w) =1 or w = l in G. 

In the special case that N — {1} 3 <p is an isomorphism. In this case G is called a free 
group too. 

If we know a set of generators and also all the relations, then we can compute 
in the isomorphic group F/N and hence in our group G. But the subgroup N will be 
infinite unless G is free, so we can’t list all its elements. Rather，a set of words 

R = {ri ， r2，".} 

is called a set of defining relations for G if R C N and if N is the smallest normal 
subgroup containing R. This means that N is generated by the subset consisting of all 
the words in R and also all their conjugates. 

It might seem more systematic to require the defining relations to be generators 

for the group But remember that the kernel of the homomorphism F - >G 

defined by a set of generators is always a normal subgroup，so there is no need to 
make the list of defining relations longer. If we know that some relation r = 1 holds 
in G, then we can conclude that grg~ l = 1 holds in G too, simply by multiplying 
both sides of the equation on the left and right by g and g " 1 . 
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We already know a few examples of generators and relations, such as the dihe¬ 
dral group D n [Chapter 5 (3.6), (3,7)], It is generated by the two elements with 
relations 

(8.2) x n =1, y 2 = 1, xyxy = L 


(8.3) Proposition, The elements x n ,y 1 ,xyxy form a set of defining relations for 
the dihedral group. 


This proposition is essentially what was checked in Chapter 5 ( 3 . 6 ), But to 
prove it formally, and to work freely with the concept of generators and relations ， 
we will need what is called the mapping property of quotient groups. It is a general¬ 
ization of the First Isomorphism Theorem: 

(8.4) Proposition* Mapping property of quotient groups: Let N be a_normal sub¬ 
group of G, let G = G/N, and let tt be the canonical map G defined by 

7 r (a) = a = aN. Let <p: G - be a homomorphism whose kernel contains N. 

There is a unique homomorphism ~(p: G - >G f such that IpTT = <p: 



This map is defined by the rule ~(p(a) = (a) _ 

Proof. To define a map G - >G f ? we must define ^(a) for every element 

a of G. To do this, we represent a by an element a E G ? choosing a so that 
a = 7 r(a). In the bar notation, this means that a = a. Now since we want our map 
Ip to satisfy the relation 9(77 (a)) = <p(a )， there is no choice but to define Ip by the 
rule ^(a) = ip {a), as asserted in the proposition. To show that this is permissible, 
we must show that the value we obtained for ^(a) ? namely ip (a), depends only on a 
and not on our choice of the representative a. This is often referred to as showing 
that our map is “well-defined.” 

Let a and a f be two elements of G such that a = a f = a. The equality a = a f 
means that aN = a f N, or [Chapter 2 (5.13)] that a f E aN. So a f — an for some 
n G N. Since N C ker <p by hypothesis, <p(n) = l. Thus <p(a f ) - <p(a)(p(n )= 
ip (a), as required• _ 

Finally^ the map 9 is a homomorphism because lp(ay<p(B) — <p{a)(p(b )= 
<p{ab) = lp{ab)^ □ 


Proof of Proposition (8.3). We showed in Chapter 5 (3.6) that D n is gener¬ 
ated by elements x,y which satisfy (8.2). Therefore there is a surjective map 

ip: F - >D n from the free group on x,y to D n ， and R = {x n ,y 2 ,xyxy} is contained 

in ker ip. Let be the smallest normal subgroup of F containing R. Then since ker <p 
is a normal subgroup which contains R, N C ker (p. The mapping property of quo- 
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tients gives us a homomorphism 7p: F/N - If we show that Ip is bijective, the 

proposition will be proved* 

Note that since <p is surjective, ~cp is too. Also, in F/N the relations x n = 1, 
y 2 = l, and xyxy = 1 hold. Using them，we can put any word in x,y into the form 
x l y j , with 0 < / ^ — 1 and 0 < 7 < 1 . This shows that F/N has at most 2n ele¬ 

ments .Since \D n \ = 2n, it follows that Ip is bijective, as required. □ 

We will use the notation 

(8.5) 〈又 1 ， •. * ，， r 1 ， • • ，， Kfc) 

to denote the group generated by elements jci ， … ， ; c m ，with defining relations 
r\ ， n. Thus 

( 8 . 6 ) D n - {x,y\ x n ,y 2 ,xyxy). 

As a new example, let us consider the group generated by x, y, with the single 
relation xyx~ l y~ x = 1 . \fx,y are elements of a group, then 

(8.7) xyr -] 

is called their commutator • This commutator is important because it is equal to 1 if 
and only if x and y commute. This is seen by multiplying both sides of the equation 
xyx~ l y~ } = 1 on the right by yx. So if we impose the relation xyx~ l y~ } = 1 on the 
free group, we will obtain a group in which x and y commute. Thus if N is the 
smallest normal subgroup containing the commutator xyx^ l y~ { and if G = F/N, 
then the residues of x and y are commuting elements of G. This forces any two ele¬ 
ments of G to commute, 

(8-8) Proposition. Let F be the free group on x,y and let N be the smallest nor¬ 
mal subgroup generated by the commutator xyx'^' 1 . The quotient group G = F/N 
is abelian. 

Proof • Let us denote the residues of the generators x,y in G by the same let¬ 
ters. Since the commutator is in iV ， the elements commute in G. Then x com¬ 
mutes with too. For xy~ l and y~ { x both become equal to x when multiplied on the 
left by y. So by the Cancellation Law，they are equal. Also, x obviously commutes 
with x and with So x commutes with any word in S' = So does 

It follows by induction that any two words in S f commute. Since xj generate the 
group, G is commutative. □ 

Note this consequence: The commutator uvu~ l v~ l of any two words in S f is in 
the normal subgroup generated by the single commutator xyx^ l y~\ because, since 
u, v commute in G, the commutator represents the identity element in G. 

The group G constructed above is called the free abelian group on the set 
{jc ， y}，because the elements xj satisfy no relations except those implied by the 
group axioms and the commutative law. 

In the examples we have seen, knowledge of the relations allows us to compute 
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easily in the group. This is somewhat misleading，because computation with a given 
set of relations is often not easy at all. For example，suppose that we change the 
defining relations (8.6) for the dihedral group slightly，substituting y 3 for y 2 : 

(8.9) G = (x,y; x n yy\xyxy). 

This group is much more complicated. When n > 5, it is an infinite group. 

Things become very difficult when the relations are complicated enough. Sup¬ 
pose that we are given a set R of words，and let N be the smallest normal subgroup 
containing R. Let w,w F be any other words. Then we can pose the problem of de¬ 
ciding whether or not w and w f represent the same element of F/N. This is called 
the word problem for groups ， and it is known that there is no general procedure for 
deciding it in a predictable length of time. Nevertheless, generators and relations al¬ 
low efficient computation in many cases, and so they are a useful tool. We will dis¬ 
cuss an important method for computation, the Todd—Coxeter Algorithm, in the 
next section. 

Recapitulating, when we speak of a group defined by generators S and relations 
R ，we mean the quotient group F/N, where F is the free group on S and N is the 
smallest normal subgroup of F containing /?. Note that any set R of relations will 
define a group，because F/N is always defined. The larger /? is，the larger N becomes 
and the more collapsing takes place in the homomorphism tt: F — ^ F/N. If R gets 
“too big，” the worst that can happen is that N = F ，hence that F/N is the trivial 
group. Thus there is no such thing as a contradictory set of relations. The only prob¬ 
lems which may arise occur when F/N becomes too small, which happens when the 
relations cause more collapsing than was expected. 


9. THE TODD-COXETER ALGORmiM 

Let H be sl subgroup of a finite group G. The Todd-Coxeter Algorithm which is de¬ 
scribed in this section is an amazing direct method of counting the cosets of // in G 
and of determining the operation of G on the set of cosets. Since we know that any 
operation on an orbit looks like an operation on cosets [Chapter 5 (6.3)], the al¬ 
gorithm is really a method of describing any group operation. 

In order to compute explicitly, both the group G and the subgroup H must be 
given to us in an explicit way. So we consider a group 

(9.1) G = 〈 Xi ， ". ， Xm; 

presented by generators and explicitly given relations ri， …，/>， as in the 

previous section. Thus G is realized as the quotient group F/N, where F is the free 
group on the set { 々 ，…， and N is the smallest normal subgroup containing 
{n ，…， &}• We also assume that the subgroup // of G is given to us explicitly by a set 
of words 

(9.2) {h u ...,h s } 

in the free group F, whose images in G generate H. 
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Let us work out a specific example to begin with，We take for G the group gen¬ 
erated by three elements m with relations x 3 ,y 2 ,z 2 ,xyz, and for H the cyclic 
subgroup generated by z: 

(9.3) G = {x,y,z;x 3 ,y 2 ,z 2 > xyz) i H = {z}. 

Since we will be determining the operation on cosets，which is a permutation 
representation [Chapter 5 ( 8 . 1 )]，we must decide how to write permutations. We 
will use the cycle notation of Section 6 . This forces us to work with right cosets Hg 
rather than with left cosets，because we want G to operate on the right. Let us denote 
the set of right cosets of ^ in G by ^. We must also decide how to describe the op¬ 
eration of our group explicitly, and the easiest way is to go back to the free group 
again, that is, to describe the permutations associated to the given generators x,y 9 z. 

The operations of the generators on the set of cosets will satisfy these rules: 

(9.4) Rules. 

1. The operation of each generator (xj,z in our example) is a permutation* 

2. The relations (x 3 ,y 2 ,z 2 ,xyz in our example) operate trivially. 

3. The generators of H (z in our example) fix the coset H 1. 

4. The operation on cosets is transitive* 

The first rule is a general property of group operations. It follows from the fact that 
group elements are invertible. We list it instead of mentioning inverses of the gener¬ 
ators explicitly. The second rule holds because the relations represent 1 in G, and it 
is the group G which operates. Rules 3 and 4 are special properties of the operation 
on cosets- 

We now determine the coset representation by applying only these rules. Let us 
use indices 1 ， 2,3 ，〜 to denote the cosets, with 1 standing for the coset HI. Since 
we don’t know how many cosets there are, we don’t know how many indices we 
need. We will add new ones as necessary. 

First, Rule 3 tells us that z sends 1 to itself: lz = 1, This exhausts the informa¬ 
tion in Rule 3, so Rules 1 and 2 take over. Rule 4 will appear only implicitly. 

We don’t know what x does to the index 1. Let’s guess that Ijc ^ 1 and assign 
a new index, say lx = 2. Continuing with the generator jc ， we don’t know 2x ， so 
we assign a third index: lx 2 = 2 x = 3. Rule 2 now comes into play. It tells us that 
x 3 fixes every index. Therefore lx 3 = 3x = 1. It is customary to sum up this infor¬ 
mation in a table 



12 3 1 

which exhibits the operation of x on the three indices. The relation xxx appears on 
the top, and Rule 2 is reflected in the fact that the same index 1 appears at both ends. 
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The bottom right corner shows that 2z = 3. This determines the rest of the table. 
There are three indices，and the operation is 

^ = (123), j = (12)，z = (23). 

Since there are three indices, we conclude that there are three cosets and that 
the index of // in G is 3. We also conclude that the order of H is 2, and hence that G 
has order 6 . For z 2 = 1 is one of our relations; therefore z has order 1 or 2， and 
since z does not operate trivially on the indices, z ^ 1. The three permutations listed 
above generate the symmetric group, so the permutation representation is an isomor¬ 
phism from G onto *S 3 . 

Of course，these conclusions depend on our knowing that the permutation rep¬ 
resentation we have constructed is the right one. We will show this at the end of the 
section. Let’s compute a few more examples first. 

(9.5) Example. Consider the tetrahedral group T of the 12 rotational symmetries of 
a regular tetrahedron (see Section 9 of Chapter 5)_ If we let y and x denote counter¬ 
clockwise rotations by 2tt/3 about a vertex and the center of a face as shown below, 


At this point，we have determined the operation of x on the three indices 1,2, 3, ex¬ 
cept for one thing: We don’t yet know that these indices represent distinct cosets. 

We now ask for the operation for y on the index 1 . Again，we don’t know it，so 
we assign a new index, say ly = 4. Rule 2 applies again. Since y 2 operates trivially, 
we know that l ^ 2 = 4y = 1: 

y y 


The remaining relation is xyz. We know that lx = 2, but we don’t yet know 
2y. So we set lxy = 2y = 5. Rule 2 then tells us that lxyz = 5z = 1: 

X y Z 

董 2 5 1 

We now apply Rule 1: The operation of each group element is a permutation of the 
indices. We have determined that lz = 1 and also that 5z = 1. It follows that 
5 = 1. We eliminate the index 5, replacing it by 1. This in turn tells us that 2y = 1. 
On the other hand，we have already determined that 4y = 1. So 4 = 2 by Rule 1 ， 
and we eliminate 4. 

The entries in the table below have now been determined: 
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then jjc = z is the rotation by tt about an edge. Thus the relations 
(9.6) jc 3 = 1, y 3 — 1 3 yxyx = 1 

hold in T. 


z 



Let us show that (9.6) is a complete set of relations for T. To do so, we con¬ 
sider the group G = {y,x; y 3 ,x 3 ,yxyx) defined by these relations. Since the rela¬ 
tions (9,6) hold in T, the mapping property of quotient groups provides a homomor¬ 
phism <p: G - >T, This map is surjective because，as is easily seen ，y and x 

generate T. We need only show that <p is injective. We will do this by showing that 
the order of the group G is 12. 

It is possible to analyze the relations directly, but they aren’t particularly easy 
to work with. We could also compute the order of G by enumerating the cosets of 
the trivial subgroup H = {1}. This is not efficient either. It is better to use a nontriv¬ 
ial subgroup H of G, such as the one generated by y. This subgroup has order at 
most 3 because y 3 = L If we show that its order is 3 and that its index in G is 4, it 
will follow that G has order 12, and we will be done. 

Here is the resulting table. To fill it in, work from both ends of the relations. 


xxx y y y y ^ y 


1 

2 

3 

1 

1 

1 

1 

1 

2 

i 

3 

1 

2 

3 

1 

2 

3 

4 

2 

3 

1 

1 

2 

3 

1 

2 

3 

4 

2 

3 

4 

4 

2 

3 

4 

4 

4 

4 

2 

3 

4 

2 

3 

4 

4 


Thus the permutation representation is 
(9.7) jc = (123), j = (234). 

Since there are four indices，the index of // is 4, Also, notice that y does have order 
precisely 3. For since j 3 = 1， the order is at most 3， and since the permutation 
(234) associated to y has order 3, it is at least 3. So the order of the group is 12, as 
predicted. Incidentally, we can derive the fact that T is isomorphic to the alternating 
group A 4 by verifying that the permutations (9.7) generate that group. □ 
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(9,8) Example. We modify the relations (9.6) slightly. Let G be generated by jc, j, 
with relations 

x 3 = 1, y 3 = 1, yxy 2 x = 1 ? 

and let H be the subgroup generated by y. Here is a start for a table. Since y 3 = 1 ， 
we have shortened the last relation, substituting for y 2 . Clearly, y~ l acts as the 
inverse of the permutation associated to y. The entries in the bottom row have been 
determined by working from the right side. 


x x x y y y y x 1 x 


1 

2 3 

1 

1 1 

1 

1 2 3 

1 

2 


2 


2 

3 1 1 

2 


We rewrite the relation 2y^ 1 = 3 as 3 j = 2, Since 2j = 3 as well，it follows that 
3y 2 — 3 and that 3y 3 = 2. But y 3 = 1 ， so 3 = 2, which in turn implies 1 = 2 = 3. 
Since the generators x,y fix 1, there is one coset, and H = G. Therefore x is a 
power of y . The third relation shows that x 2 = L Combining this fact with the first 
relation, we find x = \. Thus G is a cyclic group of order 3, This example illustrates 
how relations may collapse the group. □ 

In our examples，we have taken for H the subgroup generated by one of the 
chosen generators of G, but we could also make the computation with a subgroup H 
generated by an arbitrary set of words. They must be entered into the computation 
using Rule 3. 

This method can also be used when G is infinite, provided that the index 
[G:H] is finite. The procedure can not be expected to terminate if there are 
infinitely many cosets. 

We now address the question of why the procedure we have described does 
give the operation on cosets. A formal proof of this fact is not possible without first 
defining the algorithm formally，and we have not done this• So we will discuss the 
question informally. We describe the procedure this way: At a given stage of the 
computation, we will have some set I of indices, and the operation of some genera¬ 
tors of the group on some indices will have been determined • Let us call this a par¬ 
tial operation on L A partial operation need not be consistent with Rules 1 ， 2, and 3, 
but it should be transitive; that is，all indices should be in the “partial orbit” of 1_ 
This is where Rule 4 comes in. It tells us not to introduce any indices we don’t need. 

The starting position is I = {1}，with no operations assigned. At any stage 
there are two possible steps: 

(9.9) 

(i) We may equate two indices ij E I as a consequence of one of the first three 
rules, or 

(ii) we may choose a generator x and an index i such that \x has not yet been deter¬ 
mined and define i;c = j，where j is a new index. 
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We stop the process when an operation has been determined which is consistent with 
the rules, that is, when we have a complete，consistent table and the rules hold. 

There are two questions to ask: First, will this procedure terminate? Second, if 
it terminates, is the operation the right one? The answer to both questions is yes. It 
can be shown that the process always terminates, provided that the group is finite 
and that preference is given to Step (i). We will not prove this. The more important 
fact for applications is that if the process terminates, the resulting permutation repre¬ 
sentation is the right one. 

(9.10) Theorem. Suppose that a finite number of repetitions of Steps (i) and (ii) 
yields a consistent table. Then the table defines a permutation representation which 
is isomorphic，by suitable numbering, to the representation on cosets. 

Sketch of proof • Let I* denote the final set of indices, with its operation. We 

will prove the proposition by defining a bijective map I* - from this set to 

the set of cosets which is compatible with the two operations. We define <p^ induc¬ 
tively, by defining at each stage a map <p: I - >% from the set of indices deter¬ 

mined at that stage to %， such that <p is compatible with the partial operation on L To 

start, { 1 } —— sends Now suppose that <p: I - >% has been defined ， 

and let V be the result of applying one of Steps (9,9) to I. In case of Step (ii)，there 

is no difficulty in extending <p to a map <p f : I - We simply define 

<p r (k) = <p(k) if k 关 j，and <p/(j) = (p(i)x. Next, suppose that we use Step (ii) to 
equate two indices, say i ， j，so that I is collapsed to form the new index set F. Then 
the next lemma allows us to define the map <p f : V - >%: 

(9.11) Lemma. Suppose that a map <p: I - is given，compatible with a par¬ 

tial operation on I. Let i，j I ， and suppose that one of the Rules 1 ， 2, or 3 forces 
i = j. Then (p(i) ~ <p(j)_ 

Proof• This is true because, as we have already remarked, the operation on 
cosets does satisfy all of the Rules (9.4). So if the rules force i = j，they also force 

p(i) = <p(j). □ 

It remains to prove that the map I* - is bijective. To do this，we 

construct the inverse map - ^1*，using the following lemma: 

(9.12) Lemma. Let 5 be a set on which G operates, and let 5 E 5 be an element 

stabilized by H. There is a unique map i}/: % - >S which is compatible with the op¬ 

erations on the two sets and which sends HI 

Proof• This proof repeats that of (6.4) in Chapter 5， except that we have 
changed to right operations* Since g sends H^^Hg and since we want if/(Hg) = 

we must try to set = sg. This proves uniqueness of the map t//. To 

prove existence, we first check that the rule if/{Hg) = sg is well-defined: If 
Ha = Hb, then ba~ x E H. By hypothesis ? ba~ x stabilizes s, so sa — sb. Finally, if/ is 
compatible with the operations of G because if/(Hga) = sga = (sg)a = ij/(Hg)a. □ 
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Now，to prove the bijectivity of 中 本 ， we use the lemma to construct a map 

% - >1^. Consider the composed map 本％ - It sends HI 

We apply the lemma again, substituting % for S. The uniqueness assertion of the 
lemma tells us that is the identity map. On the other hand，since the operation 
on I* is transitive and since i // 本 is compatible with the operations, 少 * must be sur¬ 
jective. It follows that and 少 * are bijective. □ 


The axiomatic method has many advantages over honest work. 

Bertrand Russell 


EXERCISES 


1. The Operations of a Group on Itself 


1. Does the rule g, define an operation of G on itself? 

2. Let // be a subgroup of a group G. Then H operates on G by left multiplication. Describe 
the orbits for this operation, 

3* Prove the formula \G\ = |Z| + 2|C|, where the sum is over the conjugacy classes con¬ 
taining more than one element and where Z is the center of G, 

4. Prove the Fixed Point Theorem (1.12). 

5. Determine the conjugacy classes in the group M of motions of the plane. 

6. Rule out as many of the following as possible as Class Equations for a group of order 10: 

1 + 1 + 1 + 2+5 ， 1 + 2+2+5, 1+2+3+4 ， 1 + 1 + 2+2+2+2. 


7, Let F = (F 5 . Determine the order of the conjugacy class of 


2 in GL 2 (F 5 ). 


8. Determine the Class Equation for each of the following groups, 

(a) the quaternion group, (b) the Klein four group, (c) the dihedral group D s , 
(d) (e) D n , (f) the group of upper triangular matrices in GL 2 {^ 3 ), 

(g) 见 2(F 3 ). 


9, Let G be a group of order n, and let F be any field. Prove that G is isomorphic to a sub¬ 
group of GL n {F). 

10. Determine the centralizer in GL 3 ([R) of each matrix. 



— 

1 


1 


1 1 


IPV4 

1 1 

⑻ 

2 

(b) 

1 

(c) 

1 

(d) 

1 1 


3 


2 


1 


1 


—i 

1 


u— — 

1 


(e) 

1 

1 

1 — _ 

(f) 

1 

1 

__ — 



Determine all finite groups which contain at most three conjugacy classes. 

12* Let be a normal subgroup of a group G. Suppose that |A^| = 5 and that |G| is odd. 
Prove that is contained in the center of G. 
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*13. (a) Determine the possible Class Equations for groups of order 8. 

(b) Classify groups of order 8. 

14. Let Z be the center of a group G. Prove that if G/Z is a cyclic group, then G is abelian 
and hence G - Z. 

*15. Let G be a group of order 35. 

(a) Suppose that G operates nontrivially on a set of five elements. Prove that G has a 
normal subgroup of order 7. 

(b) Prove that every group of order 35 is cyclic. 

2. The Class Equation of the Icosahedra! Group 

1. Identify the intersection IDO when the dodecahedron and cube are as in Figure (2.7), 

2. Two tetrahedra can be inscribed into a cube C, each one using half the vertices. Relate 
this to the inclusion A 4 C S 4 ^ 

3. Does / contain a subgroup T1 D 6 ? D 3 ? 

4* Prove that the icosahedral group has no subgroup of order 30. 

5. Prove or disprove: A s is the only proper normal subgroup of . 

6. Prove that no group of order p e ， where p is prime and e > l, is simple, 

7* Prove or disprove: An abelian group is simple if and only if it has prime order, 

8. (a) Determine the Class Equation for the group T of rotations of a tetrahedron. 

(b) What is the center of T1 

(c) Prove that T has exactly one subgroup of order 4. 

(d) Prove that T has no subgroup of order 6. 

9. (a) Determine the Class Equation for the octahedral group O. 

(b) There are exactly two proper normal subgroups of O, Find them, show that they are 
normal, and show that there are no others. 

10. Prove that the tetrahedral group T is isomorphic to the alternating group A 4 , and that the 
octahedral group O is isomorphic to the symmetric group S 4 . Begin by finding sets of 
four elements on which these groups operate. 

11. Prove or disprove: The icosahedral group is not a subgroup of the group of real upper tri¬ 
angular 2X2 matrices. 

*12. Prove or disprove: A nonabelian simple group can not operate nontrivially on a set con¬ 
taining fewer than five elements. 

3. Operations on Subsets 

1. Let S be the set of subsets of order 2 of the dihedral group D 3 . Determine the orbits for 
the action of Z) 3 on S by conjugation. 

2. Determine the orbits for left multiplication and for conjugation on the set of subsets of 
order 3 of D 3 . 

3. List all subgroups of the dihedral group D 4 , and divide them into conjugacy classes. 

4. Let // be a subgroup of a group G. Prove that the orbit of the left coset gH for the opera¬ 
tion of conjugation contains the right coset Hg, 

5. Let t/ be a subset of a finite group G, and suppose that | U j and | G \ have no common 
factor. Is the stabilizer of | C/| trivial for the operation of conjugation? 

6. Consider the operation of left multiplication by G on the set of its subsets. Let C/ be a 
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subset whose orbit {gU} partitions G. Lxt H be the unique subset in this orbit which con¬ 
tains L Prove that H is a subgroup of G and that the sets gU are its left cosets. 

1 • Let // be a subgroup of a group G. Prove or disprove; The normalizer (H) is a normal 
subgroup of the group G. 

8, Let H C K C G be groups. Prove that H is normal in K if and only if K C N (//), 

9* Prove that the subgroup B of upper triangular matrices in GL n {U) is conjugate to the 
group L of lower triangular matrices. 

10. Let B be the subgroup of G = GL n (C) of upper triangular matrices, and let U C B be 
the set of upper triangular matrices with diagonal entries 1. Prove that B = N(U) and 
that B = N(B). 

*11. Let S n denote the subgroup of GL n {U) of permutation matrices. Determine the normal¬ 
izer of S n in GL n (U). 

12. Let 5 be a finite set on which a group G operates transitively，and let f/ be a subset of S, 
Prove that the subsets gU cover S evenly, that is, that every element of S is in the same 
number of sets gU, 

13. (a) Let // be a normal subgroup of G of order 2, Prove that H is in the center of G* 

(b) Let // be a normal subgroup of prime order in a finite group G. Suppose that p is 

the smallest prime dividing | G J. Prove that H is in the center Z(G). 

*14. Let // be a proper subgroup of a finite group G. Prove that the union of the conjugates of 
H is not the whole group G. 

15. Let K be a. normal subgroup of order 2 of a group G, and let G — G/K. Let C be a con- 
jugacy class in G. Let S be the inverse image of C in G. Prove that one of the following 
two cases occurs. 

(a) 5 =C is a single conjugacy class and |C| — 2|C|. 

(b) S = G U Ci is made up of two conjugacy classes and | Ci | = | C 2 1 = | C 

16. Calculate the double cosets HgH of the subgroup H = {l,y} in the dihedral group 
Show that each double coset has either two or four elements. 

17. Let H, K be subgroups of G, and let //' be a conjugate subgroup of H. Relate the double 
cosets H f gK and HgK. 

18. What can you say about the order of a double coset HgK? 

4L The Sylow Theorems 


1. How many elements of order 5 are contained in a group of order 20? 

2. Prove that no group of order pq ，where p and q are prime, is simple. 

3. Prove that no group of order p 2 q ，where p and q are prime, is simple. 

1 CL 

4. Prove that the set of matrices where a,c E f 7 and c = 1,2,4 forms a group of 

cj 

the type presented in (4.9b) (and that therefore such a group exists), 

5. Find Sylow 2-subgroups in the following cases: 

(a) D w (b)T (c)0 (d) /. 

6. Find a Sylow /^-subgroup of GL 2 (F P ). 


*7. (a) Let // be a subgroup of G of prime index p. What are the possible numbers of conju¬ 
gate subgroups of HI 

(b) Suppose that p is the smallest prime integer which divides \G\. Prove that // is a nor¬ 
mal subgroup. 
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*8. Let // be a Sylow p-su group of G，and let K = N(H). Prove or disprove: K = N{K). 

9. Let G be a group of order p e m. Prove that G contains a subgroup of order p r for every 
integer r < e. 

10. IM n = pm be an integer which is divisible exactly once by /?, and let G be a group 
of order n. Let // be a Sylow /^-subgroup of G, and let S be the set of all Sylow p- 
subgroups. How does S decompose into //-orbits? 

*11. (a) Compute the order of GL n (¥ p ). 

(b) Find a Sylow p-subgroup of GL n {¥ p ), 

(c) Compute the number of Sylow /?-subgroups. 

(d) Use the Second Sylow Theorem to give another proof of the First Sylow Theorem. 
*12, Prove that no group of order 224 is simple. 

13. Prove that if G has order n = p € a where l < a < p and e ^ 1, then G has a proper 
normal subgroup, 

14* Prove that the only simple groups of order < 60 are groups of prime order. 

15. Classify groups of order 33. 

16. Classify groups of order 18. 

17. Prove that there are at most five isomorphism classes of groups of order 20. 

*18. Let G be a simple group of order 60. 

(a) Prove that G contains six Sylow 5-subgroups，ten Sylow 3-subgroups, and five Sylow 
2-subgroups. 

(b) Prove that G is isomorphic to the alternating group A 5 . 

5. The Groups of Order 12 

1. Determine the Class Equations of the groups of order 12. 

2. Prove that a group of order n — 2p, where p is prime, is either cyclic or dihedral. 

*3. Let G be a group of order 30. 

(a) Prove that either the Sylow 5-subgroup K or the Sylow 3-subgroup H is normal. 

(b) Prove that HK is a cyclic subgroup of G, 

(c) Classify groups of order 30- 

4. Let G be a group of order 55. 

(a) Prove that G is generated by two elements x ? y, with the relations a: 11 = 1 ， y 5 = 1 ， 
yxy~ l ™ x r , for some r, 1 < r < 11. 

(b) Prove that the following values of r are not possible: 2, 6,7, 8, 1(X 

(c) Prove that the remaining values are possible, and that there are two isomorphism 
classes of groups of order 55. 

6. Computation in the Symmetric Group 

1. Verify the products (6,9). 

2. Prove explicitly that the permutation (123)(45) is conjugate to (241)(35). 

3. Let p,q be permutations. Prove that the products pq and qp have cycles of equal sizes. 

4. (a) Does the symmetric group Si contain an element of order 5? of order 10? of order 

15? 

(b) What is the largest possible order of an element of S 7 ? 
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5, Show how to determine whether a permutation is odd or even when it is written as a 
product of cycles. 

6. Prove or disprove: The order of a permutation is the least common multiple of the orders 
of the cycles which make it up. 

1. Is the cyclic subgroup H of S n generated by the cycle (12345) a normal subgroup? 

*8, Compute the number of permutations in Sn which do not leave any index fixed 

9. Determine the cycle decomposition of the permutation i ^ww^n—i, 

10. (a) Prove that every permutation p is a product of transpositions. 

(b) How many transpositions are required to write the cycle (123 ••• n)? 

(c) Suppose that a permutation is written in two ways as a product of transpositions, say 
p — Ti 丁 2 - •.丁 m and p — Ti ' 丁 2 ' • • • T/Z _ Prove that m and n are both odd or else they are 
both even, 

11. What is the centralizer of the element (12) of S 4 1 

12. Find all subgroups of order 4 of the symmetric group S 4 . Which are normal? 

13. Determine the Class Equation of A 4r 

14. (a) Determine Ihe number of conjugacy classes and the Class Equation for S $. 

(b) List the conjugacy classes in A 5 , and reconcile this list with the list of conjugacy 
classes in the icosahedral group [see (2.2)]. 

15. Prove that the transpositions (12), (23) ， ". ，（ n-l ， n) generate the symmetric group S n ^ 
16* Prove that the symmetric group S n is generated by the cycles (12 … n) and (12). 

17. (a) Show that the product of two transpositions (ij)(kl) can always be written as a 

product of 3-cycles. Treat the case that some indices are equal too. 

(b) Prove that the alternating group A n is generated by 3-cycles, if n ^ 3. 

18, Prove that if a proper normal subgroup of S n contains a 3-cycle，it is A n * 

*19. Prove that A n is simple for all n > 5, 

*20. Prove that A n is the only subgroup of S n of index 2. 

21, Explain the miraculous coincidence at the end of the section in terms of the opposite 
group (Chapter 2, Section 1， exercise 12). 

7. The Free Group 


1. Prove or disprove; The free group on two generators is isomorphic to the product of two 
infinite cyclic groups, 

2. (a) Let F be the free group on x ， y. Prove that the two elements u = x 2 and v = v 3 gen¬ 

erate a subgroup of F which is isomorphic to the free group on u,v. 

(b) Prove that the three elements u = x 2 ^ v = y 2 , and z = xy generate a subgroup iso¬ 


morphic to the free group on 

3. We may define a closed word in 5' to be the oriented loop obtained by joining the ends 
of a word. Thus 


b 

a 


ca 


b l 


a 


bbd 


c 


represents a closed word, if we read it clockwise. Establish a bijective correspondence 
between reduced closed words and conjugacy classes in the free group. 
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4. Let be a prime integer. Let N be the number of words of length p in a finite set S. 
Show that is divisible by p. 

8. Generators and Relations 

1. Prove that two elements a,b of a. group generate the same subgroup as bab 2 , bab 3 . 

2. Prove that the smallest normal subgroup of a group G containing a subset S is generated 
as a subgroup by the set {gsg~ l | g E G, ^ E 5}. 

3. Prove or disprove; y 2 x 2 is in the normal subgroup generated by xy and its conjugates. 

4. Prove that the group generated by x,y,z with the single relation yxyz~ 2 = 1 is actually a 
free group. 

5. Let 5 be a set of elements of a group G, and let {n} be some relations which hold among 

the elements S in G. Let F be the free group on 5. Prove that the map F - >G (8,1) 

factors through F/N, where is the normal subgroup generated by {n}. 

6. Let G be a group with a normal subgroup N, Assume that G and G/N are both cyclic 
groups. Prove that G can be generated by two elements. 

1. A subgroup // of a group G is called characteristic if it is carried to itself by all automor¬ 
phisms of G. 

(a) Prove that every characteristic subgroup is normal- 

(b) Prove that the center Z of a group G is a characteristic subgroup* 

(c) Prove that the subgroup H generated by all elements of G of order n is characteristic. 

8« Determine the normal subgroups and the characteristic subgroups of the quaternion 
group, 

9. The commutator subgroup C of a group G is the smallest subgroup containing all 
commutators. 

(a) Prove that the commutator subgroup is a characteristic subgroup. 

(b) Prove that G/C is an abelian group, 

10 * Determine the commutator subgroup of the group M of motions of the plane. 

11. Prove by explicit computation that the commutator is in the normal sub¬ 

group generated by the two commutators xyx~ l y~ { and xzx' 1 z" 1 and their conjugates, 

12. Let G denote the free abelian group (x,y; xyx" ] y~ ] ) defined in (8,8), Prove the universal 

property of this group: If w, v are elements of an abelian group A, there is a unique 
homomorphism <p: G - >A such that <p{x) - u, (p{y) = v. 

13. Prove that the normal subgroup in the free group (x,y) which is generated by the single 

commutator is the commutator subgroup. 

14. Let be a normal subgroup of a group G, Prove that G/N is abelian if and only if 
contains the commutator subgroup of G. 

15. Let <p: G — G ; be a surjective group homomorphism. Let 5 be a subset of G such that 
(p(S) generates G r , and let 7 be a set of generators of ker <p. Prove that S U T generates 
G, 

16. Prove or disprove: Every finite group G can be presented by a finite set of generators and 
a finite set of relations, 

17. Let G be the group generated by z, with certain relations {r；}. Suppose that one of 
the relations has the form wx, where w is a word in y, z. Let ri be the relation obtained 
by substituting w' 1 for x into n , and let G f be the group generated by y ， z，with relations 
{r/}. Prove that G and G f are isomorphic. 
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9. The Todd-Coxeter Algorithm 

1. Prove that the elements x, y of (9.5) generate T, and that the permutations (9.7) generate 

A4. 

2. Use the Todd - Coxeter Algorithm to identify the group generated by two elements x 9 y 9 


with the following relations. 

(a) x 2 : 

=y 2 

=1 ， xyx = 

=yxy 

(b) jc 2 : 


=1, xyx = 

r 

(c) X 3 : 

= 

=1, xyx = 

: y^y 

(d) x 4 : 


=1, xyx - 

= yxy 

(e) jc 4 : 


2 2 

一 x y _ 

1 


3. Use the Todd-Coxeter Algorithm to determine the order of the group generated by 
with the following relations, 

(a) x 4 = 1 ， y 3 = X y = y 2 x (b) x 7 = l, y 3 = \, yx — x 2 y. 

4. Identify the group G generated by elements z, with relations x A = y A = z z = 
x 2 z 2 = 1 and z = xy, 

5. Analyze the group G generated by with relations x 4 — l y y 4 = l y x 2 — y 2 9 
xy = y 3 x. 

*6* Analyze the group generated by elements x 9 y, with relations x~ l yx = y~\ y~ l xy = x" 1 . 

7. Let G be the group generated by elements x ， y，with relations x 4 — 1, y 3 — 1 ， x 2 = yxy. 
Prove that this group is trivial in these two ways. 

(a) using the Todd-Coxeter Algorithm 

(b) working directly with the relations 

8. Identify the group G generated by two elements x ， y，with relations x 3 = y 3 = 
yxyxj = 1. 

9. Let p be integers >1. The triangle group G pqr is defined by generators 

G pqr — {x,y, z; x p ,y q , z r ， xyz). In each case, prove that the triangle group is isomorphic 
to the group listed, 

(a) the dihedral group D n , when p ， q，r = 2,2，n 

(b) the tetrahedral group, when p ， q，r = 2,3, 3 

(c) the octahedral group，when p 、 q、r = 2,3, 4 

(d) the icosahedral group，when p ， q，r ; 2, 3, 5 

10. Let A denote an isosceles right triangle, and let a, b, c denote the reflections of the plane 
about the three sides of A. Let x = ab, y — be, z = ca. Prove that x, y, z generate a tri¬ 
angle group. 

11. (a) Prove that the group G generated by elements x,y,z with relations x 2 = y 3 = 

z 5 = 1, xyz = 1 has order 60. 

(b) Let H be the subgroup generated by x and zyz~\ Determine the permutation repre¬ 
sentation of G on G/H, and identify //. 

(c) Prove that G is isomorphic to the alternating group A 5 - 

(d) Let K be the subgroup of G generated by x and yxz. Determine the permutation rep¬ 
resentation of G on G/K, and identify K, 

Miscellaneous Problems 

1. (a) Prove that the subgroup T f of O 3 of all symmetries of a regular tetrahedron, includ¬ 
ing orientation-reversing symmetries，has order 24. 
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(b) Is T f isomorphic to the symmetric group S 4 ? 

(c) State and prove analogous results for the group of symmetries of a dodecahedron. 

2. (a) Let U = {l,x} be a subset of order 2 of a group G. Consider the graph having one 
vertex for each element of G and an edge joining the vertices g to gx for all g G G. 
Prove that the vertices connected to the vertex 1 are the elements of the cyclic group 
generated by x. 

(b) Do the analogous thing for the set U = {l ， jc ， y }， 

*3. (a) Suppose that a group G operates transitively on a set 5, and that H is the stabilizer of 
an element s 0 E 5. Consider the action of G on S x S defined by g{s\,S 2 )= 
(gs\, gh). Establish a bijective correspondence between double cosets of // in G and 
G-orbits in S x 5. 

(b) Work out the correspondence explicitly for the case that G is the dihedral group D 5 
and S is the set of vertices of a 5-gon. 

(c) Work it out for the case that G - T and that S is the set of edges of a tetrahedron. 

*4. Assume that // C ^ C G are subgroups, that H is normal in K, and that K is normal in 
G. Prove or disprove: H is normai in 

*5. Prove the Bruhat decomposition, which asserts that GL n (U) is the union of the double 
cosets BPB ， where B is the group of upper triangular matrices and P is a permutation 
matrix. 

6. (a) Prove that the group generated by x 7 y with relations x 2 y y 2 is an infinite group in two 

ways: 

(i) It is clear that every word can be reduced by using these relations to the form 
_ • * xyxy • • •. Prove that every element of G is represented by exactly one such 
word. 

(ii) Exhibit G as the group generated by reflections r,r f about lines €，€' whose 
angle of intersection is not a rational multiple of 2tt. 

(b) Let TV be any proper normal subgroup of G. Prove that G/N is a dihedral group. 

7. Let H, N be subgroups of a group G, and assume that is a normal subgroup. 

(a) Determine the kernels of the restrictions of the canonical homomorphism 

7r: G - > G/N to the subgroups H and HN t 

(b) Apply the First Isomorphism Theorem to these restrictions to prove the 
Second Isomorphism Theorem: H/[H D N) is isomorphic to (HN)/N 9 

8. Let H, N be normal subgroups of a group G such that H D N, and let H = H/N, 
G = G/N. 

(a) Prove that // is a normal subgroup of G. 

(b) Use the composed homomorphism G - >G - > G/H to prove the 

Third Isomorphism Theorem: G/H is isomorphic to G/H t 



Chapter 7 


Bilinear Forms 

I presume that to the uninitiated 
the formulae will appear cold and cheerless. 

Benjamin Pierce 

L DEFTJMTIOIM OFBUJNEAR FORM 

Our model for bilinear forms is the dot product 

(1.1) Y) = X l Y = x } y { + + 

of vectors in U n , which was described in Section 5 of Chapter 4. The symbol (X - Y) 
has various properties, the most important for us being the following: 

(1.2) Bilinearity: (X 】 + X 2 m Y) = (x } - Y) + (x 2 - Y) 

(x^y { + y 2 ) = (m) + (x-r 2 ) 

(cX - Y) = c(X^Y) = (X - cY) 

Symmetry : (X ^ y) = (y* X) 

Positivity: (x*x) > 0, ifx 0. 

Notice that bilinearity says this: If one variable is fixed, the resulting function of the 

remaining variable is a linear transformation U n - > U. 

We will study dot product and its analogues in this chapter. It is clear how to 
generalize bilinearity and symmetry to a vector space over any field, while positivity 
is, a priori，applicable only when the scalar field is U. We will also extend the con¬ 
cept of positivity to complex vector spaces in Section 4. 
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Let Vbe a vector space over a field F. A bilinear form on V is a function of two 

variables on V 9 with values in the field: V x V—^F, satisfying the bilinear axioms ， 
which are 

(1.3) + v 2 ,w) = f(vi , w) + f{v 2 , w) 

f(cv, w) = cf(v, w) 

f(v,Wi + w 2 ) = + f(v 9 w 2 ) 

fip ， cw) = cf(v, w) 

for all v, w y Vi, wi E V and all c E ： F. Often a notation similar to dot product is 
used. We will frequently use the notation 

(1.4) (v,w) 

to designate the value /(m, v) of the form. So (v, w) is a scalar, an element of F. 

A form 〈 ， 〉 is said to be symmetric if 

(1.5) (v,w) = (w,v) 
and skew-symmetric if 

(1*6) (v,w) = -(w,v) 9 

for all v, w E V. (This is actually not the right definition of skew-symmetry if the 
field F is of characteristic 2, that is，if 1 + 1 = 0 in F. We will correct the definition 
in Section 8.) 

If the form/is either symmetric or skew-symmetric，then linearity in the sec¬ 
ond variable follows from linearity in the first. 

The main examples of bilinear forms are the forms on the space F n of column 
vectors，obtained as follows: Let A be an n x n matrix in F, and define 

(L7) (X,Y) = X x AY. 

Note that this product is a 1 x 1 matrix，that is, a scalar，and that it is bilinear. Ordi¬ 
nary dot product is included as the case A = I. 

A matrix A is symmetric if 

(1.8) A x = A, that is, a\j = ap for all i ， j. 

(L9) Proposition. The form (1.7) is symmetric if and only if the matrix A is sym¬ 
metric. 

Proof • Assume that A is symmetric. Since Y l AX is a 1 x 1 matrix，it is equal to 
its transpose: Y x AX = = X x AY. Thus (Y,x) = {X, Y). The other im¬ 

plication is obtained by setting X = a and Y = ej. We find {ei y e } ) — eiAej = cnj ， 
while = aji. If the form is symmetric, then ay = ajt , and so A is symmetric. □ 


Let 〈，〉 be a bilinear form on a vector space V， and let B = (Ui ， ••• ， v n ) be a ba¬ 
sis for V. We can relate the form to a product X l AY by the matrix of the form with 
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respect to the basis. By definition, this is the matrix A = (<%)，where 

( 1 . 10 ) a tj = (vi.vj). 

Note that A is a symmetric matrix if and only if 〈， > is a symmetric form. Also, the 
symmetry of the bilinear form does not depend on the basis. So if the matrix of the 
form with respect to some basis is symmetric, its matrix with respect to any other 
basis will be symmetric too. 

The matrix A allows us to compute the value of the form on two vectors 
v, w E V. Let X, Y be their coordinate vectors，as in Section 4 of Chapter 3, so that 
v = BX, w = By. Then 

(v,w) = ViXi, 2 Vjyj)• 

* m 

I J 

This expands using bilinearity to ^ Xiyj{vi,Vj) = ^ xia^yj = x l AY: 

A ■ ■ ■ 

(1.11) (v,w) = X l AY. 

Thus, if we identify F n with V using the basis B as in Chapter 3 (4.14)，the bilinear 
form 〈，〉 corresponds to X x AY. 

As in the study of linear operators, a central problem is to describe the effect 
of a change of basis on such a product. For example, we would like to know what 
happens to dot product when the basis of U n is changed. This will be discussed 
presently. The effect of a change of basis B = B'P [Chapter 3 (4.16)] on the matrix 
of the form can be determined easily from the rules X* = PX, Y r = PY ： If A' is the 
matrix of the form with respect to a new basis B', then by definition of A r y 
(v, w) = X n A t Y t — X^Wpy. But we also have (v,w) = X { AY t So 

(1.12) PWP = A. 

Let Q = (P— Since P can be any invertible matrix, Q is also arbitrary. 

(L13) Corollary. Let A be the matrix of a bilinear form with respect to a basis. 
The matrices A f which represent the same form with respect to different bases are the 
matrices A' = QAQ\ where Q is an arbitrary matrix in GL n (F). □ 

Let us now apply formula (1 _ 12) to our original example of dot product on U n . 
The matrix of the dot product with respect to the standard basis is the identity ma¬ 
trix: (x • y) = X 1 IY. So formula (1.12) tells us that if we change basis，the matrix of 
the form changes to 

(Li4) a 1 = (p~ l yi(p^) = (p- 1 )^- 1 ), 

where P is the matrix of change of basis as before. If the matrix P happens to be or- 
thogonal meaning that P l P = /, then A' = /， and dot product carries over to dot 
product; (X • F) = (PX - PY) = (X f - y')，as we saw in Chapter 4 (5.13). But under a 
general change of basis，the formula for dot product changes to X n A f Y f , where A f is 
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as in (1_ 14), For example, let « = 2, and let the basis B r be 


Then 

(1.15) P— 1 = 

The matrix A ; represents dot product on 1R 2 , with respect to the basis 

We can also turn the computation around. Suppose that we are given a bilinear 
form 〈，〉 on a real vector space V. Let us ask whether or not this form becomes dot 
product when we choose a suitable basis. We start with an arbitrary basis B, so that 
we have a matrix A to work with. Then the problem is to change this basis in such a 
way that the new matrix is the identity, if that is possible. By formula (1.12 )， this 
amounts to solving the matrix equation / = ( 尸 _i yA(P—or 

(1-16) A = P X P, 

(1,17) Corollary. The matrices A which represent a form equivalent to dot 
product are the matrices A = P l P 9 where P is invertible. □ 

This corollary gives a theoretical answer to our problem of determining the bi¬ 
linear forms equivalent to dot product，but it is not very satisfactory because we 
don’t yet have a practical method of deciding which matrices can be written as a 
product P X P, let alone a practical method of finding P. 

We can get some conditions on the matrix A from the properties of dot product 
listed in (L2). Bilinearity imposes no condition on A , because the symbol X l AY is al¬ 
ways bilinear. However，symmetry and positivity restrict the possibilities. The easier 
property to check is symmetry: In order to represent dot product, the matrix A must 
be symmetric. Positivity is also a strong restriction. In order to represent dot 
product, the matrix A must have the property that 

(L18) X { AX > 0, for all X ^ 0. 

A real symmetric matrix having this property is called positive definite. 

(1.19) Theorem • The following properties of a real nx n matrix A are equivalent: 

(i) A represents dot product，with respect to some basis of IR n . 

(ii) There is an invertible matrix P E GL n (U) such that A = P i P. 

(iii) A is symmetric and positive definite. 



We have seen that (i) and (ii) are equivalent [Corollary (1,17)] and that (i) implies 
(iii). So it remains to prove the remaining implication, that (iii) implies (i). It will be 
more convenient to restate this implication in vector space form. 
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A symmetric bilinear form { ? ) on a finite-dimensional real vector space V is 
called positive definite if 

(1-20) (u, t>) > 0 

for every nonzero vector v E V. Thus a real symmetric matrix A is positive definite 
if and only if the form (x ? y) = x l AY it defines on U n is a positive definite form. 
Also, the form 〈，〉 is positive definite if and only if its matrix A with respect to any 
basis is a positive definite matrix. This is clear，because if X is the coordinate vector 
of a vector o, then (v,v) = X l AX (1.11). 

Two vectors v, w are called orthogonal with respect to a symmetric form if 
(v 7 w) = 0. Orthogonality of two vectors is often denoted as 

(1-21) i ; 丄 w. 

This definition extends the concept of orthogonality which we have already seen 
when the form is dot product on U n [Chapter 4 (5,12)]. A basis B = (vi 9 ..^v n ) of V 
is called an orthonormal basis with respect to the form if 

(Vi , Vj) = 0 for all i ^ y, and (vi,vi) = 1 for all /• 

It follows directly from the definition that a basis B is orthonormal if and only if the 
matrix of the form with respect to B is the identity matrix. 

(1.22) Theorem• Let {,) be a positive definite symmetric form on a finite-dimen' 
sional vector real space V. There exists an orthonormal basis for K 

Proof . We will describe a method called the Gram—Schmidt procedure for 
constructing an orthonormal basis, starting with an arbitrary basis B = (仍 ，…， 

Our first step is to normalize v u so that (vi ,v { ) = 1, To do this we note that 

(1.23) (cv, cv) = c 2 v. 

Since the form is positive definite ，(vi ? v } ) > 0, We set c = 〈仍， 1^>4, and replace 
Vi by wi = cvi. 

Next we look for a linear combination of wi and V2 which is orthogonal to vvi. 
The required linear combination is w = v 2 - aw\, where a = (v 2y w { ) : 〈 w ， w!> = 
〈 U 2 ， vvi 〉 一 a{w u Wi) = (v 2 , wi) — a = 0. We normalize this vector w to length 1. 
obtaining a vector w 2 which we substitute for v 2 . The geometric interpretation of this 
operation is illustrated below for the case that the form is dot product. The vector 
flvvi is the orthogonal projection of v 2 onto the subspace (the line) spanned by . 
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This is the general procedure. Suppose that the - 1 vectors Wk-\ are 

orthonormal and that (Wi, … ， Wk- 1 ，极 ， … ， D n ) is a basis. We adjust Vk as follows: We 
let at = {vk, Wi) and 

(1.24) w — Vk ~ ci\Wi — atWt —… 一 ak-\Wk-\ - 

Then w is orthogonal to Wi for i = 1 ， • • • ，众 一 1 ， because 

{w, wi) = (vk,Wi) - ai(w u wi) - 奶〈州 2 ，咐〉一 … 一 ak-i(wk-i,Wi). 

Since wi ， ... ， 收 -i are orthonormal，all the terms {wj,Wi) 9 1 < j < k — 1, are zero 
except for the term (wt ? Wi) ? which is L So the sum reduces to 

(w, Wi) = (vk.Wi) - ai(w iy Wi) = (Vk, Wi) - at = 0 . 

We normalize the length of w to 1, obtaining a vector Wk which we substitute for Vk 
as before. Then (wu...,wk) is orthonormal. Since Vk is in the span of 
Vk+i ， … ， u n )，this set is a basis. The existence of an orthonormal basis follows by in¬ 
duction on k. □ 


End of the proof of Theorem (1.19). The fact that part (iii) of Theorem (1.19) im¬ 
plies (i) follows from Theorem (1.22). For if A is symmetric and positive definite ， 
then the form (X,Y) = X x AY it defines on U n is also symmetric and positive definite. 
In that case，Theorem (1.22) tells us that there is a basis B' of U n which is orthonor¬ 
mal with respect to the form (X, Y) = X { AY. (But the basis will probably not be or¬ 
thonormal with respect to the usual dot product on U n .) Now on the one hand，the 
matrix A f of the form (x, Y) with respect to the new basis B' satisfies the relation 
P l A r P = A (1.12), and on the other hand，since B' is orthonormal, A f = I. Thus 
A = ptp. This proves (ii)，and since (i) and (ii) are already known to be equivalent ， 
it also proves (i). □ 


Unfortunately, there is no really simple way to show that a matrix is positive 
definite. One of the most convenient criteria is the following: Denote the upper left 
i x i submatrix of A by Ai. Thus 



(1.25) Theorem. A real symmetric nX n matrix A is positive definite if and only 
if the determinant det Ai is positive for each i = 1 ，".， 

For example，the 2 x 2 matrix 

(i26) a -[i b d _ 

is positive definite if and only if a > 0 and ad — be > 0. Using this criterion, we 
can check immediately that the matrix A f of (1.15) is positive definite, which agrees 
with the fact that it represents dot product. 

The proof of Theorem (1,25) is at the end of the next section. 
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2. SYMMETRIC FORMS: ORTHOGONALITY 

In this section, we consider a finite-dimensional real vector space V on which a sym¬ 
metric bilinear form 〈 ， 〉 is given，but we drop the assumption made in the last sec¬ 
tion that the form is positive definite. A form such that (v,v) takes on both positive 
and negative values is called indefinite• The Lorentz form 

X l AY = Xiy { + ^ 2^2 + - c 2 x a Ja 

of physics is a typical example of an indefinite form on “space-time” 1R 4 . The 
coefficient c representing the speed of light can be normalized to 1 ， and then the ma¬ 
trix of the form with respect to the given basis becomes 


( 2 . 1 ) 1 • 

-J 

We now pose the problem of describing all symmetric forms on a finite-dimen¬ 
sional real vector space. The basic notion used in the study of such a form is still that 
of orthogonality. But if a form is not positive definite, it may happen that a nonzero 
vector v is self-orthogonal: (v,v) = 0. For example, this is true for the vector 
(1 ， 0,0, l)t e when the form is defined by (2.1). So we must revise our geomet¬ 
ric intuition. It turns out that there is no need to worry about this point. There are 
enough vectors which are not self-orthogonal to serve our purposes. 

(2.2) Proposition. Suppose the symmetric form 〈，〉 is not identically zero. Then 
there is a vector v E V which is not self-orthogonal: (v,v) ^ 0. 

Proof. To say that 〈，〉 is not identically zero means that there is a pair of vec¬ 
tors v,w E V such that (v, w) ^ 0. Take these vectors. If (v,v) ^ 0, or if 
〈 vv ， w 〉 关 0， then the proposition is verified. Suppose (v,v) = (w, w) = 0. Let 
u = v + w, and expand (m, u) using bilinearity: 

(u y u) = (v + w,v w) = (v, v) + (v, w) + (w, v) + (w, w) = 0 + 2(v, w) + 0 . 
Since (v, w) ^ 0, it follows that ( m , m ) ^ 0, □ 

If W is a subspace of V, then we will denote by W 1 the set of all vectors v 
which are orthogonal to every w E W: 

(2.3) 灰丄 ={u e 1/ I (o ? W) = 0}. 

This is a subspace of V, called the orthogonal complement to W. 

(2.4) Proposition. Let w E V be a vector such that (w, w) ^ 0. Let W = {cw} 
be the span of w. Then V is the direct sum of W and its orthogonal complement: 

V = W ㊉ W 气 
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Proof. According to Chapter 3 (6.4, 6.5), we have to show two things: 


(a) W Pi W 1 = 0. This is clear. The vector cw is not orthogonal to w unless 
c = 0, because (cw, w) = c(w, w) and (w, w) ^ 0. 

(b) W and W 1 span V: Every vector v E V can be written in the form 

v = aw + where v r E To show this, we solve the equation 

(v — aw, w) = 0 for a: (v — aw, w) = (v,w) — a(w, w) = 0. The solution is 

(v,w) 


a 


(w, w) 


We SQtv f = v 


aw. 


Two more concepts which we will need are the null space of a symmetric form 
and nondegenerate form. A vector i; E V is called a null vector for the given form if 
(v y w) = 0 for all w E V, that is, if v is orthogonal to the whole space V. The null 
space of the form is the set of all null vectors 

(2.5) N = {v \ (v,V) = 0} = V 1 . 

A symmetric form is said to be nondegenerate if the null space is {0}. 

(2.6) Proposition* Let A be the matrix of a symmetric form with respect to a 
basis, 

(a) The null space of the form is the set of vectors v such that the coordinate vec¬ 
tor X of o is a solution of the homogeneous equation AX — 0. 

(b) The form is nondegenerate if and only if the matrix A is nonsingular. 

Proof. Via the basis, the form corresponds to the product X { AY [see (Lll)]. 
We might as well work with this product. If y is a vector such that AY = 0, then 
X l AY = 0 for all X; hence Y is in the null space. Conversely, suppose that AY ^ 0. 
Then AY has at least one nonzero coordinate. The ith coordinate of AY is e t x AY. So 
one of the products e^AY is not zero. This shows that Y is not a null vector，which 
proves (a). Part (b) of the proposition follows from (a). □ 

Here is a generalized version of (2,4): 

(2.7) Proposition. Let W be a subspace of V, and consider the restriction of a 
symmetric form 〈，〉 to Suppose that this form is nondegenerate on W. Then 

y = w ㊉ 

We omit the proof，which closely follows that of (2.4 ) .口 

(2.8) Definition. An orthogonal basis B = for V, with respect to a 

symmetric form 〈，>， is a basis such that 切 丄 ty for all i ^ j. 

Since the matrix A of a form is defined by = (vt , vj) 5 the basis B is orthogo¬ 
nal if and only if A is a diagonal matrix. Note that if the symmetric form 〈， > is non- 
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degenerate and the basis B = is orthogonal, then (vi 9 vi) ^ 0 for all i: the 

diagonal entries of A are nonzero. 

(2.9) Theorem* Let 〈，〉 be a symmetric form on a real vector space V. 

(a) There is an orthogonal basis for V. More precisely, there exists a basis 
B = (t；i ，…， such that {vi,vj) = 0 for / j and such that for each /_， (vi , Vi) 
is either 1 ， -1，or 0. 

(b) Matrix form: Let A be a real symmetric nX n matrix. There is a matrix 
Q G GL n (U) such that QAQ 1 is a diagonal matrix each of whose diagonal en¬ 
tries is 1 ， -1， or 0. 

Part (b) of the theorem follows from (a)，and (1.13), taking into account the fact 
that any symmetric matrix A is the matrix of a symmetric form. □ 

We can permute an orthogonal basis B so that the indices with (vt ? vi) = 1 are 
the first ones, and so on. Then the matrix A of the form will be 

~h 

(2.10) A = ~Im 

_ o 2 

where p is the number of +l’s，w is the number of -l’s, and z is the number of 0 ， s ， 
so that p + m + z = n. These numbers are uniquely determined by the form or by 
the matrix A: 

(2.11) Theorem. Sylvester s Law: The numbers p,m,z appearing in (2.10) are 
uniquely determined by the form. In other words, they do not depend on the choice 
of orthogonal basis B such that (vi , vi) = ±1 or 0. 

The pair of integers (p ， m) is called the signature of the form. 

Proof of Theorem (2.9). If the form is identically zero, then the matrix A , computed 
with respect to any basis, will be the zero matrix，which is diagonal. Suppose the 
form is not identically zero. Then by Proposition (2.2 )， there is a vector v = v\ with 
(v\ , V\) ^ 0. Let W be the span of oi. By Proposition (2.4), V = W ㊉ W 丄， and so a 
basis for V is obtained by combining the basis (oi) of W with any basis (u 2 , … ， v n ) of 
W 丄 [Chapter 3 (6.6)]. The form on V can be restricted to the subspace W 丄 ， and it 
defines a form there. We use induction on the dimension to conclude that W 1 has an 
orthogonal basis ( 仍 ， …， Then (v u v 2 ,..^v n ) is an orthogonal basis for V. For ， 
(t ； i, Vi) = 0 if i > 1 because D/ E W 丄， and {vi , vj) = 0 if j > 1 and i ^ j, 
because ( 仍 ，…， is an orthogonal basis. 

It remains to normalize the orthogonal basis just constructed- If (vi.Vi) ^ 0 y 
we solve c~ 2 = ±(vi,vi) and change the basis vector Vi to cvi. Then is 

changed to ±L This completes the proof of (2.9.) □ 
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Proof of Theorem (2.11). Let r — p + m. (This is the rank of the matrix A,) Let 
(Ui， …， be an orthogonal basis of V of the type under consideration, that is, so 
that the matrix is (2.10). We will first show that the number z is determined by prov¬ 
ing that the vectors u r + 1 , _ _ • ， form a basis for the null space N = V 1 . This will 
show that z = dim N ， hence that z does not depend on the choice of a basis. 

A vector w G V is a null vector if and only if it is orthogonal to every ele¬ 
ment Vi of our basis. We write our vector as a linear combination of the basis: w = 
nui + … + c n v n . Then since , vj) = 0 if / 关 ）， we find - Ci{vi,v x ). 

Now {v“Vi) = 0 if and only if / > r. So in order for w to be orthogonal to every Vi, 
we must have d = 0 for all i < r. This shows that (u,+i ，…， spans N, and，being 
a linearly independent set, it is a basis for N. 

The equation p + m + z = n proves that p + m is also determined. We still 
have to show that one of the two remaining integers p,m is determined. This is not 
quite so simple. It is not true that the span of for instance, is uniquely 

determined by the form. 

Suppose a second such basis (tV ， … ， uV) is given and leads to integers p, ， m’ 
(with z f = z). We will show that the p + (n — p f ) vectors 

(2.12) t)i ， … ， tv; ty+i’ ， … ， u/ 

are linearly independent. Then since V has dimension n ，it will follow that 
p + (n — p f ) ^ n, hence that p < p\ and，interchanging the roles of p and p f ， 
that p = p\ 

Let a linear relation between the vectors (2.12) be given. We may write it in 
the form 

(2.13) biv } + + bpV p - Cp^xVp^i + + c n v n f . 

Let v denote the vector defined by either of these two expressions. We compute 
{v 9 v) in two ways. The left-hand side gives 

(v ， v〉= bi 2 {vuV\) + + 〜 2 〈叫， u p 〉= h 2 + … + V 2 0 ， 

while the right-hand side gives 

{v, v) = fy+i 2 〈 iy+i ， ty+i〉+ ••• + Cn 2 (v n \v f n ) = -cy+i 2 —… 一 c p f ^ m t2 < 0. 

It follows that b\ 2 + + bp = 0, hence that b\ = … =b p = 0, Once this is 

known，the fact that is a basis combines with (2.13) to imply 

c p f ^\ = = c n = 0. Therefore the relation was trivial, as required. □ 

For dealing with indefinite forms, the notation is often used to denote the 
diagonal matrix 

(2.14) lp，m = 

With this notation，the matrix representing the Lorentz form (2.1) is / 3 ,卜 

We will now prove Theorem (1.25) — that a matrix A is positive definite if and 
only if det A, > 0 for all i. 
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Proof of Theorem (L25). Suppose that the form X { AY is positive definite，A 
change of basis in U n changes the matrix to A f = QAQ\ and 

det A f = (det g)(det A)(det Q l ) = (det g) 2 (det A). 

Since they differ by a square factor, det A 7 is positive if and only if det A is 
positive. By (1,19)，we can choose a matrix Q so that A' = /， and since I has deter¬ 
minant 1, det A > 0. 

The matrix A ; represents the restriction of the form to the subspace Vi spanned 
by (ui ， … ， u,)，and of course the form is positive definite on V/. Therefore det A, > 0 
for the same reason that det A > 0. 


Conversely，suppose that det A/ is positive for all /• By induction on n, we may 
assume the form to be positive definite on V n -u Therefore there is a matrix 
Q f G GL n —\ such that Q r A n -\Q n = I n -\> Let Q be the matrix 


Q = 


Q f 



Then 


* 

QAQ X = f • 

* 

氺 • * •氺 


We now clear out the bottom row of this matrix, except for the (n ， n) entry, by ele¬ 
mentary row operations Eu^.,E n ~\. Let P = E n -\ Then 




— ■ 

■ 

0 



/ 

• 

• 

A = PAP 1 = 



• 




0 


J 

3 … o 

C- 


for some c. The last column has also been cleared out because A ' 二 PAP 1 is symmet¬ 
ric. Since det A > 0 ? we have det A r = (det A)(det P) 2 > 0 too, and this implies 
that c > 0. Therefore the matrix A f represents a positive definite form. It also repre¬ 
sents the same form as A does. So A is positive definite, a 


3. THE GEOMETRY ASSOCIA TED TOAPOSTTIVE FORM 

In this section we return to look once more at a positive definite bilinear form 〈，〉 on 
an n-dimensional real vector space V. A real vector space together with such a form 
is often called a Euclidean space • 

It is natural to define the length of a vector v by the rule 

(3.1) |t;| = 

in analogy with the length of vectors in U n [Chapter 4 (5*10)]. One important con¬ 
sequence of the fact that the form is positive definite is that we can decide whether a 
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vector v is zero by computing its length: 

(3.2) u = 0 if and only if (v, v) = 0* 

As was shown in Section 1， there is an orthonormal basis B = for 

V, and thereby the form corresponds to dot product on U n : 

(v,w) = X l Y, 

if v = BX and w = BY. Using this correspondence，we can transfer the geometry of 
R n over to V. Whenever a problem is presented to us on a Euclidean space V, a natu¬ 
ral procedure will be to choose a convenient orthonormal basis, thereby reducing 
the problem to the familiar case of dot product on U n . 

When a subspace VK of V is given to us, there are two operations we can make. 
The first is to restrict the form < ， > to the subspace, simply by defining the value of 
the form on a pair W\, w 2 of vectors in W to be 〈 wi ， vv 2 ). The restriction of a bilinear 
form to a subspace VK is a bilinear form on W, and if the form is symmetric or if it is 
symmetric and positive definite，then so is the restriction. 

Restriction of the <form can be used to define the unoriented angle between two 
vectors v,w. If the vectors are linearly dependent, the angle is zero. Otherwise ， 
(u ， iv) is a basis of a two-dimensional subspace W of V. The restriction of the form 
to W is still positive definite，and therefore there is an orthonormal basis (Wi ， w 2 ) for 

W. By means of this basis, v, w correspond to their coordinate vectors X, Y in U 2 . 
This allows us to interpret geometric properties of the vectors v, w in terms of prop¬ 
erties of X, y. 

Since the basis {w\ , w 2 ) is orthonormal, the form corresponds to dot product 
on U 2 : {v, w) = X l Y. Therefore 

\v\ = |x |， \w\ = \ y\ y and (v, w) = (x ^ Y). 

We define the angle 6 between v and w to be the angle between X and Y, and thereby 
obtain the formula 

(3.3) (v, w) — I u 11 w I cos dj 

as a consequence of the analogous formula [Chapter 4 (5.11)] for dot product in U 2 . 
This formula determines cos 0 in terms of the other symbols，and cos 6 determines 6 
up to a factor of ± 1. Therefore the angle between v and w is determined up to sign 
by the form alone. This is the best that can be done, even in IR 3 , 

Standard facts such as the Schwarz Inequality 

(3.4) \{v,w)\ < |d||w| 
and the Triangle Inequality 

(3.5) \v + w \ ^ \ v\ + \ w 

can also be proved for arbitrary Euclidean spaces by restriction to a two-dimensional 
subspace. 
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The second operation we can make when a subspace W is given is to project V 
onto W. Since the restriction of the form to W is positive definite, it is nondegener¬ 
ate, Therefore V = VK© W 丄 by (2,17)，and so every u E V has a unique expression 

(3.6) u = w + w ’， with w E ： W and (w,w f ) = 0. 

The orthogonal projection tt\ V - >W defined to be the linear transformation 

(3-7) u /ww ^7r(u) = w 

where w is as in (3.6). 

The projected vector tt (v) can be computed easily in terms of an orthonormal 
basis (Wi ，…， w r ) of W. What follows is important; 

(3.8) Proposition. Let (Wi ， ." ， w r ) be an orthonormal basis of a subspace W, and 
let v E ： V. The orthogonal projection tt (u) of v onto w is the vector 

7r{v) = (V, W\)W X + ••• + (l), W r }Wr. 

Thus if tt is defined by the above formula，then v - tt(v) is orthogonal to W. This 
formula explains the geometric meaning of the Gram—Schmidt procedure described 
in Section 1 . 

Proof. Let us denote the right side of the above equation by w. Then (w, wi )= 

— (v, Wi) for i = 1 ， …， r，hence v - w E ： W 1 , Since the expression 

(3.6) for v is unique, w = w and w f = v — w. a 

The case W = V is also important. In this case, tt is the identity map. 

(3.9) Corollary. Let B = (vu...,v n ) be an orthonormal basis for a Euclidean 
space V. Then 

V = (v,v { )vi + … + {v 7 Vn)v n ~ 

In other words, the coordinate vector of v with respect to the orthonormal basis B is 

4. HERMITIAN FORMS 

In this section we assume that our scalar field is the field C of complex numbers. 
When working with complex vector spaces, it is desirable to have an analogue of the 
concept of the length of a vector，and of course one can define length on C n by iden¬ 
tifying it with U 2n . If X = (xi ，…， x n y is a complex vector and if x r = a r -\- b r i, then 
the length of X is 

(4.1) \X\ = V«i 2 + bi 2 + ••- + a n 2 + b n 2 = Vx\X\ + … + x n x n , 

where the bar denotes complex conjugation. This formula suggests that dot product 
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is “wrong” for complex vectors and that we should define a product by the formula 

(4.2) (Xj Y) = x l Y = 无 ji + … + x n y n . 

This product has the positivity property: 

(4.3) (X ， X) is a positive real number if X 幸 Q • 


Moreover ， (4.2) agrees with dot product for real vectors. 

The product (4.2) is called the standard hermitian product, or the hermitian 
dot product. It has these properties: 

(4.4) 

Linearity in the second variable: 

(X, CY) = c(X,Y) and (x, + Y 2 ) = (x, Yi) + (x f Y 2 )； 

Conjugate linearity in the first variable: 

(cX, Y) = c(x, Y) and (Xi + X 2 ,Y) = (X l 5 r) + {X 2 ,Y)\ 

Hermitian symmetry: 

(y, x) = (xj). 


So we can have a positive definite product at a small cost in linearity and symmetry. 

When one wants to work with notions involving length，the hermitian product 
is the right one，though symmetric bilinear forms on complex vector spaces also 
come up in applications. 

If V is a complex vector space，a hermitian form on V is any function of two 
variables 


(4.5) 


Vxv —— > C 

u ， w /WW^ 〈 u ， w) 


satisfying the relations (4.4). Let B = (ui，__•，％) be a basis for V. Then the matrix 
of the form is defined in the analogous way as the matrix of a bilinear form: 

A = (aij ) 9 where a" = (vi.vj). 

The formula for the form now becomes 


(4.6) (t), w) = X x AY, 

if v = BX and w = BY. 

The matrix A is not arbitrary, because hermitian symmetry implies that 


aij = (vi.vj) = (vj.Vi) = aji , 

that is，that A = A Let us introduce the adjoint of a matrix A [different from the 
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one defined in Chapter 1 (5.4)] as 

(4.7) A* = A 1 . 

It satisfies the following rules: 

(A + fl)* = A* + 
(AS)* = 

(A*) -1 = (A -1 )* 
A** = 九 


These rules are easy to check, Formula (4.6) can now be rewritten as 

(4*8) (u, w) = x^AY, 

and the standard hermitian product on C n becomes (x, Y) = 

A matrix A is called hermitian or self-adjoint if 

(4.9) A = A *， 

and it is the hermitian matrices which are matrices of hermitian forms. Their entries 
satisfy % = aij. This implies that the diagonal entries are real and that the entries 
below the diagonal are complex conjugates of those above it: 

r\ aij 
*- 

A = • 

* 

aij r n 

is a hermitian matrix. 

Note that the condition for a real matrix to be hermitian is ap = av } : 

(4.10) The real hermitian matrices are the real symmetric matrices • 


For example, 


n e [R, aij e C. 


The discussion of change of basis in Sections 1 and 2 has analogues for hermi¬ 
tian forms. Given a hermitian form, a change of basis by a matrix P leads as in 
(1.12) to 

X^A F Y f = (PX)^A f PY = X^(P^A f P)Y. 

Hence the new matrix A f satisfies 

(4.11) A = P^A f P or A f = (P^y [ AP^\ 

Since P is arbitrary，we can replace it by 0 = (P*) 一 1 to obtain the description 
analogous to (L13): 

(4.12) Corollary, Let A be the matrix of a hermitian form with respect to a basis. 
The matrices which represent the same hermitian form with respect to different 
bases are those of the form A' = QAQ^, for some invertible matrix Q E GL n (C). □ 
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For hermitian forms，the analogues of orthogonal matrices are the unitary ma¬ 
trices. A matrix P is called unitary if it satisfies the condition 

(4.13) P^P = / or 

is a unitary matrix. 

Note that for a real matrix P, this condition becomes P l P = /: 

(4.14) The real unitary matrices are the real orthogonal matrices • 

The unitary matrices form a group，the unitary group U n ： 

(4.15) U n = {p\ = /}• 

Formula (4.11) tells us that unitary matrices represent changes of basis which leave 
the standard hermitian product X^Y invariant: 

(4.16) Corollary* A change of basis preserves the standard hermitian product ， 

that is, X^Y = if and only if its matrix P is unitary. □ 

But Corollary (4.12) tells us that a general change of basis changes the stan¬ 
dard hermitian product X^Y to X f ： ¥ A f Y\ where A f = and Q E GL n (C). 

The notion of orthogonality for hermitian forms is defined exactly as for sym- 
metric bilinear forms: v is called orthogonal to w if (v,w) = 0. Since {v, w)= 
〈 w ， u〉，orthogonality is still a symmetric relation. We can now copy the discussion 
of Sections 1 and 2 for hermitian forms without essential change，and Sylvester’s 
Law (2.11) for real symmetric forms carries over to the hermitian case. In particu¬ 
lar, we can speak of positive definite forms, those having the property that 

(4.17) (v ， v) is a positive real number if v 丰 0, 

and of orthonormal bases B = ( 仍 ，…， ％)，those such that 
(4*18) (viyVi) = 1 and {vi, vj) = 0 if i ^ j\ 

(4.19) Theorem. Let < ， > be a hermitian form on a complex vector space V. There 
is an orthonormal basis for V if and only if the form is positive definite. 

(4.20) Proposition. Let W be a subspace of a hermitian space V. If the restriction 
of the form to W is nondegenerate，then V = W ㊉ 

The proofs of these facts are left as exercises. □ 


For example ， 


V2 1 - 
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5 . THE SPECTRAL THEOREM 

In this section we will study an n-dimensional complex vector space V and a positive 
definite hermitian form { ? ) on V. A complex vector space on which a positive 
definite hermitian form is given is often called a hermitian space • You can imagine 
that V is C n , with its standard hermitian product if you want to. The choice of 
an orthonormal basis in V will allow such an identification. 

Since the form 〈， > is given, we will not want to choose an arbitrary basis for V 
in order to make computations. It is natural to work exclusively with orthonormal 
bases. This changes all previous calculations in the following way: It will no longer 
be true that the matrix P of a change of basis is an arbitrary invertible matrix. 
Rather，if B = ( 仍， … ， t^) ， B' = (v\ f are two orthonormal bases，then the 
matrix P relating them will be unitary. The fact that the bases are orthonormal means 
that the matrix of the form 〈，〉 with respect to each basis is the identity /， and so 
(4,11) reads I = P*/P ， or = /. 

We are going to study a linear operator 

(5.1) t ： v — 

on our space. Let B be an orthonormal basis, and let M be the associated matrix of T. 
A change of orthonormal basis changes M to M r = PMP ' [Chapter 4 (3.4)] where P 
is unitary; hence 

(5.2) M f = PMP^. 

(5.3) Proposition* Let T be a linear operator on a hermitian space V, and let M be 
the matrix of T with respect to an orthonormal basis B. 

(a) The matrix M is hermitian if and only if {v, Tw) = (Tv ， w) for all u, w E V. 

If so, T is called a hermitian operator ， 

(b) The matrix M is unitary if and only if 〈 u ， w) — (Tv, Tw) for all v, w E V. 

If so, T is called a unitary operator. 

Proof • Let X, Y be the coordinate vectors of v, w: v = BX, w = BY, so that 
{v,w) = and Tv — BMX. Then {v, Tw) = X^MY, and {Tv, w) = So if 

M = M*，then {v, Tw) = (Tv,w) for all v,w; that is ， 7 1 is hermitian. Conversely, if 
T is hermitian, we set v w = ej as in the proof of (1.9) to obtain 

bij = e^(Mej) = = bji. Thus M = M*. Similarly, (v,w) = X^Y and 

(Tv, Tw) = X^M^MY, so (u, w) = (Tv, Tw) for all v, w if and only if M*M = /. □ 

(5.4) Theorem. Spectral Theorem: 

(a) Let 7 be a hermitian operator on a hermitian vector space V. There is an or¬ 
thonormal basis of V consisting of eigenvectors of T, 

(b) Matrix form: Let M be a hermitian matrix. There is a unitary matrix P such that 
PMP* is a real diagonal matrix. 
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Proof. Choose an eigenvector v = and normalize so that its length is 1: 
(v,v) = L Extend to an orthonormal basis. Then the matrix of T becomes 



'a 

4c • • • 本 1 

M = 

0 

• 

■ _ 

N 



0 


■ 


Since T is hermitian，so is the matrix M (5.3). This implies that * *•• * = 0 ••• 0 and 
that N is hermitian. Proceed by induction. □ 


To diagonalize a hermitian matrix M by a unitary P, one can proceed by deter¬ 
mining the eigenvectors. If the eigenvalues are distinct, the corresponding eigenvec¬ 
tors will be orthogonal. This follows from the Spectral Theorem. Let B f be the 
orthonormal basis obtained by normalizing the lengths of the eigenvectors to L 
Then P = [B'] _1 [Chapter 3 (4.20)]. 

For example, let 


M = 



2 




The eigenvalues of this matrix are 3 ， 1 ， and the vectors 


Vi 





are eigenvectors with these eigenvalues. We normalize their lengths to 1 by the fac¬ 
tor ^4=, Then 

V2 



and PMP^ 


But the Spectral Theorem asserts that a hermitian matrix can be diagonalized 
even if its eigenvalues aren't distinct . This statement becomes particularly simple for 
2x2 matrices: If the characteristic polynomial of a 2 x 2 hermitian matrix M has a 
double root，then there is a unitary matrix P such that PMP^ = al , Bringing the P's 
over to the other side of the equation, we obtain M = P^alP = aP^P = al. So it 
follows from the Spectral Theorem that M = a! • The only 2x2 hermitian matrices 
whose characteristic polynomials have a double root are the matrices al , where a is 
a real number. We can verify this fact directly from the definition. We write 
a j3 
p d 

■ 

mial is t 2 — (a + d)t + {ad — 即 )， This polynomial has a double root if and only 
if its discriminant vanishes，that is，if 


M 


， where a, d are real and is complex. Then the characteristic poly no 


{a + d) 2 — 4(ad — = (a — d) 2 + 4/3/3 = 0. 
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Both of the terms (a — d) 2 and /3/3 are nonnegative real numbers. So if the discrimi¬ 
nant vanishes，then a = d and j8 = 0. In this case, M = al J as predicted. 

Here is an interesting consequence of the Spectral Theorem for which we can 
give a direct proof: 

(5.5) Proposition. The eigenvalues of a hermitian operator T are real numbers. 

Proof • Let a be an eigenvalue, and let v be an eigenvector for T such that 
T(v) = av. Then by (5.3) (Tv,v) = (u, Tv); hence (av,v) = (v,av). By conjugate 
linearity (4,4), 

a(v,v) = (av,v) = (v,av) = a(v, u), 

and {v, v) 0 because the form 〈，〉 is positive definite. Hence a = a. This shows 
that a is real. □ 

The results we have proved for hermitian matrices have analogues for real 
symmetric matrices. Let V be a real vector space with a positive definite bilinear 
form 〈，夂 Let 7 1 be a linear operator on V. 

(5.6) Proposition « Let M be the matrix of T with respect to an orthonormal basis. 

(a) The matrix M is symmetric if and only if 〈 u ， Tw) = (Tv, w) for all v,w E V. 
If so, T is called a symmetric operator. 

(b) The matrix M is orthogonal if and only if (v,w) = {Tv ， Tw) for all v,w E ： V. 
If so, T is called an orthogonal operator, □ 

(5.7) Proposition. The eigenvalues of a real symmetric matrix are real. 

Proof. A real symmetric matrix is hermitian. So this is a special case of (5.5 ) ，口 

(5.8) Theorem. Spectral Theorem (real case): 

(a) Let r be a symmetric operator on a real vector space V with a positive definite 
bilinear form. There is an orthonormal basis of eigenvectors of T. 

(b) Matrix form: Let M be a real symmetric nX n matrix. There is an orthogonal 
matrix P E O n (U) such that PMP 1 is diagonal. 

Proof. Now that we know that the eigenvalues of such an operator are real，we 
can copy the proof of (5.4). □ 

6. CONICS AM) QUADRICS 

A conic is the locus in the plane U 2 defined by a quadratic equation in two variables, 
of the form 

(6.1) f(x u X 2 ) = a n xi 2 -h 2 a 12 XiX 2 +a 22 X 2 2 + biXi + b 2 x 2 + c = 0. 
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More precisely, the locus (6.1) is a conic, meaning an ellipse, a hyperbola, or a 
parabola, or else it is called degenerate. A degenerate conic can be a pair of lines, a 
single line, a point，or empty, depending on the particular equation. The term 
quadric is used to designate the analogous loci in three or more dimensions. 

The quadratic part oif{x\ ，又 2 ) is called a quadratic form: 

(6*2) q{x\,X2) = a\\X\ 2 + 2fli2 ， iX2 + a 22 X2. 

In general, a quadratic form in n variables x 】， … ， x« is a polynomial each of whose 
terms has degree 2 in the variables. 

It is convenient to express the form q(x x , x 2 ) in matrix notation. To do this，we 
introduce the symmetric matrix 


(6,3) 


A = 




Then q(xi,x 2 ) = X { AX, where X denotes the column vector (x } ? x 2 )\ We also intro¬ 
duce the row vectors = (b \, b 2 ). Then equation (6.1) can be written in matrix nota¬ 
tion as 


(6.4) X l AX + BX + c = 0. 

We put the coefficient 2 into formulas (6.1) and (6.2) in order to avoid some 
coefficients \ in the matrix (6.3). An alternative way to write the quadratic form 
would be 


q(x\,X2) = a\\X\ 2 + anX\%2 + a\ 2 x 2 X\ + a 22 X2* 

We propose to describe the congruence classes of conics as geometric figures 
or, what is the same，their orbits under the action of the group M of rigid motions of 
the plane. A rigid motion will produce a change of variable in equation (6.1)* 

(6.5) Theorem. Every nondegenerate conic is congruent to one of the following: 


(i) Ellipse: a\\x x 2 + a 2 iX2 2 —1=0 ， 

(ii) Hyperbola; a\\X \ 2 — aiiXi —1=0 ， 

(iii) Parabola: anX\ 2 — X 2 = 0, where flu, a 2 2 > 0* 

Proof • We simplify equation (6.1) in two steps, first applying an orthogonal 
transformation (a rotation or reflection) to diagonalize A and then applying a transla¬ 
tion to eliminate, as much as possible，the linear and constant terms BX + c. 

By the Spectral Theorem (5.8), there is an orthogonal matrix P such that PAP 1 
is diagonal. We make the change of variable = PX，orX = P l X\ Substitution into 
equation (6.4) yields 

(6.6) X n (PAP l )x f + {BP^X' + c = 0. 

Hence there is an orthogonal change of variable such that the quadratic form be¬ 
comes diagonal，that is，the coefficient an of x\x 2 is zero. 
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Suppose that A is diagonal. Then / has the form 

f(x\,x 2 ) = auXi 2 +a 2 2 X 2 2 + b\X[ + b 2 x 2 + c = 0. 
We eliminate bi by completing the squares, making the substitution 


(6.7) 


Xi = 




2au 



* 


This substitution results in 

( 6 , 8 ) f(x[,x 2 ) = a u x\ f2 + anXi 2 + c\ 

where is a number which can be determined if desired. This substitution corre¬ 
sponds to translation by the vector (b\/2a n , b 2 / 2 a 2 2 )\ and we can make it provided 
flu , 此 are not zero. 

If an = 0 but bi ^ 0, then we can use the substitution 


(6.9) xi = Xi — c/bi 

to eliminate the constant term instead. We may normalize one coefficient to -L Do¬ 
ing so and eliminating degenerate conics leaves us with the three cases listed in the 
theorem. It is not difficult to show that a change of the coefficients an, a 22 results in 
a different congruence class, except for the interchange of au,a 2 2 in the equation of 
an ellipse. □ 

The method used above can be applied in any number of variables to classify 
quadrics in n dimensions. The general quadratic equation has the form 

(6.10) f(x u ^^x n ) = 2 a ii' x i 2 + 2 2dijXiXj + 2 biXi + c - 0. 

i i<j i 

We could also write this equation more compactly as 


(6.11) f{x u ^^x n ) = 2 ciijXiXj + 2 biXi + c - 0, 

» ■ ■ 

where the first sum is over all pairs of indices, and where we set aji — a_ 
We define the matrices A,B to be 




A = 母 
* 

* 

^lm 


Then the quadratic form is 


^lm 





( 6 . 12 ) 


q(xu..^Xn) = X l AX, 


and 

(6_13) /(JCi ， …， = X l AX BX + c. 
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By a suitable orthogonal transformation P, the quadric is carried to ( 6 . 6 ), where 
PAP 1 is diagonal. When A is diagonal, linear terms are eliminated by the translation 
( 6 . 7 )，or else ( 6 . 9 ) is used. 

Here is the classification in three variables: 

(6.14) Theorem. The congruence classes of nondegenerate quadrics in 1R 3 are rep¬ 
resented by 

(i) Ellipsoids: anX 1 2 +a 2 2X2 2 +a33X3 2 —\ = 0, 

(ii) l-sheeted hyperboloids: a\\Xi 2 +a22X2—a^x^ 2 —\ = 0 , 

(iii) 2 -sheeted hyperboloids: anXi 2 —a2iX2 2 —ci33X3 2 — l = 0, 

(iv) Elliptic paraboloids: awx^+anx^—xz — 0, 

(v) Hyperbolic paraboloids: aux^—anxi—x^ = 0 , 

where flu , 022,^33 > 0 . □ 


If a quadratic equation f{x \, x 2 ) = 0 is given, we can determine the type of 
conic it represents most easily by allowing nonorthogonal changes of coordinates. 
For example, if the associated quadratic form q is positive definite, then the conic is 
either an ellipse, or else it is degenerate (a single point or empty). To distinguish 
these cases，arbitrary changes of coordinates are permissible. A nonorthogonal co¬ 
ordinate change will distort the conic, but it will not change an ellipse into a hyper¬ 
bola or a degenerate conic. 

As an example, consider the locus 

(6.15) xi 2 +xix 2 +x 2 2 + 4jci + 3x 2 + 4 = 0, 

The associated matrix is 


A = 



丄 

一 2 




which is positive definite by (1.25). We diagonalize A by the nonorthogonal substitu¬ 
tion X f = PX, where 



PAP 1 = 


3 ， fi〆=(4, 1 )， 

4 _ 


to obtain 


X\ 2 +\x 2 2 + Ax\ +xi +4 = 0. 


Completing the square yields 

v //2 I 3 ^2 _ 1 — n 

X\ 十 4X2 3 — U, 

an ellipse. Thus (6.15) represents an ellipse too. On the other hand，if we change the 
constant term of (6.15) to 5, the locus becomes empty. 
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7. THE SPECTRAL THEOREM FOR NORMAL OPERATORS 

The Spectral Theorem (5.4) tells us that any hermitian matrix M can be transformed 
into a real diagonal matrix D by a unitary matrix P: D = PMP^. We now ask for the 
matrices M which can be transformed in the same way to a diagonal matrix D, but 
where we no longer require D to be real. It turns out that there is an elegant formal 
characterization of such matrices. 

(7.1) Definition. A matrix M is called normal if it commutes with its adjoint，that 
is, if MM* = M*M. 

(7.2) Lemma. If M is normal and P is unitary, then M f — PMP^ is also normal, 
and conversely. 

Proof • Assume that M is normal, ThenM’M f * = PM 尸 *(PMP*)* = PMM^P^ — 
尸 . So PMP* is normal. The converse follows by 
replacing P by P*. □ 

This lemma allows us to define a normal operator T: V - > V on a hermitian 

space V to be a linear operator whose matrix M with respect to any orthonormal basis 
is a normal matrix, 

(7.3) Theorem. A complex matrix M is normal if and only if there is a unitary 
matrix P such that PMP^ is diagonal. □ 

The most important normal matrices, aside from hermitian ones，are unitary 
matrices: Since M* = AT 1 if M is unitary ， — = /, which shows that M is 

normal. 

(7.4) Corollary. Every conjugacy class in the unitary group contains a diagonal 
matrix. □ 

Proof of Theorem (73). First, any two diagonal matrices commute, so a diag¬ 
onal matrix is normal: DD* = D^D. The lemma tells us that M is normal if 
PMP^ = D. Conversely，suppose that M is normal. Choose an eigenvector v = V\ of 
M, and normalize so that (v,v) = 1 ， as in the proof of (5.4). Extend {v\} to an or¬ 
thonormal basis. Then M will be changed to a matrix 



The upper left entry of Mi*Mi is a\{au, while the same entry of Mi Mi* is 
a u au + CL\2an+ t -+a\ n a\n. Since M is normal, so is Mi, that is，= MiMi*. It 
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follows that a x {a\ 2 + — = 0. Since a\ja\j > 0 3 this shows that the entries a\j 
with j > l are zero and that 


M x 


an 0 - 

o " 

* 

• N 


0 


0 


We continue，working on N. □ 


8. SKEW-SYMMETRIC FORMS 


The theory of skew-symmetric forms is independent of the field of scalars. One 
might expect trouble with fields of characteristic 2, in which 1 + 1=0. They look 
peculiar because a = -a for all a, so the conditions for symmetry (1,5) and for 
skew symmetry (1.6) are the same. It turns out that fields of characteristic 2 don’t 
cause trouble with skew-symmetric forms，if the definition of skew symmetry is 
changed to handle them. The definition which works for all fields is this: 


(8.1) Definition. A bilinear form 〈， > on a vector space V is skew-symmetric if 


for all v G V. 


(v,v) = 0 


The rule 

(8.2) (^, w) = ~(w y v) 

for all v,w E V continues to hold with this definition* It is proved by expanding 

{v + w,v + w) = (v, v) + (v, w) + {w, v) + (w, w), 

and by using the fact that {v, v) = (w, w) = (v + w,v + w) = 0. If the character¬ 
istic of the field of scalars is not 2, then (8.1) and (8.2) are equivalent. For if (8,2) 
holds for all v, w ? then setting w = u we find (v,v) = This implies that 

2(v, v) = 0, hence that (v,v) — 0 unless 2 = 0 in the field. 

Note that if F has characteristic 2, then 1 = -1 in F, so (8.2) shows that the 
form is actually symmetric* But most symmetric forms don’t satisfy (8.1). 

The matrix A of a skew-symmetric form with respect to an arbitrary basis is 
characterized by the properties 

(8*3) an = 0 and aij = -ap ， if i ^ j. 

We take these properties as the definition of a skew-symmetric matrix. If the charac¬ 
teristic is not 2 ? then this is equivalent with the condition 

(8.4) A 1 = -儿 
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(8.5) Theorem. 

(a) Let V be a vector space of dimension m over a field F y and let 〈，〉 be a nonde¬ 
generate skew-symmetric form on V. Then m is an even integer, and there is a 
basis B of V such that the matrix A of the form with respect to that basis is 

厂 0 /I 


where 0，/ denote the n x n matrices and n = jm. 

(b) Matrix form: Let A be a nonsingular skew-symmetric m x m matrix. Then m is 
even，and there is a matrix Q E GL m (F) such that QAQ 1 is the matrix J 2n ^ 


A basis B as in (8.6a) is called a standard symplectic basis ， Note that rearrang¬ 
ing the standard symplectic basis in the order (v\, ， u 2 , v n ^ 2 ,^^v n ,v 2 n) changes 

the matrix J 2n into a matrix made up of 2 x 2 blocks 

_ o r 

-1 o_ 

along the diagonal. This is the form which is most convenient for proving the theo¬ 
rem. We leave the proof as an exercise. □ 


9. SUMMARY OF RESULTS, IN MATRIX NOTATION 

Real numbers; A square matrix A is symmetric if A 1 = A and orthogonal if 
A 1 = A 一 1 • 

(1) Spectral Theorem: If A is a real symmetric matrix, there is an orthogonal ma¬ 
trix P such that PAP\ — PAP^ 1 ) is diagonal. 

(2) If A is a real symmetric matrix, there is a real invertible matrix P such that 

\h " 

PAP 1 = ~Im ， 

_ 0z_ 

for some integers p y m, z. 

(3) Sylvester’s Law: The numbers p,m,z are determined by the matrix A . 

Complex numbers: A complex square matrix A is hermitian if = A, uni- 
tary if = A— 】， and normal if AA* = 

(1) Spectral Theorem: If A is a hermitian matrix, there is a unitary matrix P such 
that PAP* is a real diagonal matrix. 


(2) If A is a normal matrix, there is a unitary matrix P such that PAP* is diagonal. 
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F arbitrary: A square n 乂 n matrix is skew-symmetric if an = 0 and atj = 
-aji for all i, If A is an invertible skew-symmetric matrix ? then n is even, and there 

is an invertible matrix P so that PAP T has the form 

- —* 

0 / 

-I 0 • 

(9.1) Note. The rule A r = (p i y l A(P~ l ) for change of basis in a bilinear form (see 
(1.12)) is rather ugly because of the way the matrix P of change of coordinates is 
defined. It is possible to rearrange equations (4.17) of Chapter 3, by writing 

(9.2) tV = 2W or B" = QB\ 

j 

This results in Q = (p -1 ) 1 , and with this rule we obtain the nicer formula 

V = QAQ\ 

to replace (1.12). We can use it if we want to. 

The problem with formula (9.2) is that change of basis on a linear transforma¬ 
tion gets messed up; namely the formula A r = PAP~ l [Chapter 4 (3,4)] is replaced 
by A f = (Q~ i yAQ\ Trying to keep the formulas neat is like trying to smooth a bump 
in a rug. 

This brings up an important point. Linear operators on V and bilinear forms on 
V are each given by an « x n matrix A, once a basis has been chosen. One is tempted 
to think that the theories of linear operators and of bilinear forms are somehow 
equivalent, but they are not，unless a basis is fixed. For under a change of basis the 
matrix of a bilinear form changes to (p l ) l AP~ ] (M2)，while the matrix of a linear 
operator changes to PAP~ l [Chapter 4 (3.4)]. So the new matrices are no longer 
equal. To be precise, this shows that the theories diverge when the basis is changed, 
unless the matrix P of change of basis happens to be orthogonal. If P is orthogonal ， 
then P = and we are all right. The matrices remain equal. This is one benefit 

of working with orthonormal bases. 

Yvonne Verdier 

EXERCISES 

L Definition of Bilinear Form 

1. Let A and B be real nX n matrices. Prove that if X l AY = X l BY for all vectors X, Y in U n , 
then A = B. 
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2. Prove directly that the bilinear form represented by the matrix 
definite if and only \f a > 0 and ad — b 2 > 0. 

3. Apply the Gram-Schmidt procedure to the basis (1 ， 1 ， 0) 1 ， （ 1 ， 0, 1) 、（ 0, 1 ， 1)' when the 

form is dot product. 

— 

2 1 

4. Let A = j 2 - Find an orthonormal basis for U 2 with respect to the form X l AY. 

■ — 

5. (a) Prove that every real square matrix is the sum of a symmetric matrix and a skew- 

symmetric matrix (A 1 = —A) in exactly one way. 

(b) Let < ，〉 be a bilinear form on a real vector space V. Show that there is a symmetric 
form ( ， ）and a skew-symmetric form [ ， ] so that〈，〉= (，）+ [，]• 

6. Let 〈，〉 be a symmetric bilinear form on a vector space V over a field F, The function q: 

V - >F defined by q (v) — (v, v) is called the quadratic form associated to the bilinear 

form. Show how to recover the bilinear form from q, if the characteristic of the field F is 
not 2, by expanding q{v + w). 

*7. Let X, Y be vectors in C' and assume that X 0. Prove that there is a symmetric matrix 
B such that BX = Y. 


a b 
b d 


is positive 


2* Symmetric Forms: Orthogonality 

1. Prove that a positive definite form is nondegenerate. 

2. A matrix A is called positive semidefinite if X l AX > 0 for all X E R n . Prove that A l A is 
positive semidefinite for any m x n real matrix A . 

3. Find an orthogonal basis for the form on U n whose matrix is as follows. 



1 1 


一 一 

1 0 1 

⑻ 

1 1 

(b) 

0 2 1 




1 1 1 


4. Extend the vector = (1 ， 1 ， l)t/V5 to an orthonormal basis for [R 3 . 

*5. Prove that if the columns of an m x n matrix A form an orthonormal basis, then the rows 
do too. 

6* Let A y A f be symmetric matrices related by A = P x A f P, where P E GL n (F) , Is it true that 
the ranks of A and of A f are equal? 

7. Let A be the matrix of a symmetric bilinear form 〈 ，> with respect to some basis. Prove or 
disprove: The eigenvalues of A are independent of the basis. 

8. Prove that the only real matrix which is orthogonal, symmetric, and positive definite is 
the identity. 

9* The vector space P of all real polynomials of degree < n has a bilinear form, defined by 

(f ， g> = j f(x)g(x)dx. 

Find an orthonormal basis for P when n has the following values, (a) 1 (b) 2 (c) 3 

10. Let V denote the vector space of real n X n matrices. Prove that (a,b) = traceO^S) is a 
positive definite bilinear form on V. Find an orthonormal basis for this form. 
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11. A symmetric matrix A is called negative definite if X l AX < 0 for all X ^ 0. Give a crite¬ 
rion analogous to (1.26) for a symmetric matrix A to be negative definite. 

12. Prove that every symmetric nonsingular complex matrix A has the form A 二 P X P. 

13. In the notation of (2.12)，show by example that the span of (q ， ■ ■ ■ ， v p ) is not determined 
by the form. 

14. (a) Let W be a subspace of a vector space V on which a symmetric bilinear form is given. 

Prove that W 1 - is a subspace. 

(b) Prove that the null space # is a subspace. 

15. Let W\,Wi be subspaces of a vector space V with a symmetric bilinear form* Prove each 
of the following. 

(a) (Wi + W 2 y - n 恥丄 （ b) W C 州丄丄 （ c) If 叭 C 呎， then 〕 呎丄 • 

16* Prove Proposition (2.7)，that V = W ㊉ WMf the form is nondegenerate on W. 

17. Let V = U 2x2 be the vector space of real 2x2 matrices. 

(a) Determine the matrix of the bilinear form 〈 A ， S〉= trace(Afi) on V with respect to the 
standard basis {ejj}. 

(b) Determine the signature of this form. 

(c) Find an orthogonal basis for this form. 

(d) Determine the signature of the form on the subspace of V of matrices with trace 
zero. 

*18. Determine the signature of the form (a,b) - trace AB on the space U nXn of real nx n 
matrices. 

19. Let V = U 2xl be the space of 2 X 2 matrices• 

(a) Show that the form 〈 A ， S〉defined by (a,b) — det(A + s) — det A — det B is sym¬ 
metric and bilinear. 

(b) Compute the matrix of this form with respect to the standard basis {eij}, and deter¬ 
mine the signature of the form. 

(c) Do the same for the subspace of matrices of trace zero. 

20. Do exercise 19 for [R 3x3 , replacing the quadratic form det A by the coefficient of t in the 
characteristic polynomial of A. 

21. Decide what the analogue of Sylvester’s Law for symmetric forms over complex vector 
spaces is, and prove it. 

22. Using the method of proof of Theorem (2,9), find necessary and sufficient conditions on 
a field F so that every finite-dimensional vector space V over F with a symmetric bilinear 
form {,) has an orthogonal basis. 

23. Let F = F 2 , and let A — 

(a) Prove that the bilinear form x x AY on F' 2 can not be diagonalized. 

(b) Determine the orbits for the action P.A^^^PAP 1 of GL 2 {F) on the space of 2 x 2 
matrices with coefficients in F. 

3. The Geometry Associated to a Positive Form 

1. Let V be a Euclidean space. Prove the Schwarz Inequality and the Triangle Inequality. 

2. Let W be a subspace of a Euclidean space V. Prove that W = W 11 . 

3. Let V be a Euclidean space. Show that if 1 1 ; | 二 | w j，then (t; + w) 丄 (t; — w). Interpret 
this formula geometrically. 
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4. Prove the parallelogram law \v + w\ 2 + \ v — w\ 2 = 2\v\ 2 + 2| w | 2 in a Euclidean 
space. 

5. Prove that the orthogonal projection (3.7) is a linear transformation. 

6 * Find the matrix of the projection 7 r: U 3 - > U 2 such that the image of the standard bases 

of U 3 forms an equilateral triangle and 77 (^ 1 ) points in the direction of the x-axis, 

*7. Let W be a two-dimensional subspace of 1R 3 , and consider the orthogonal projection 7 r of 
IR 3 onto W. Let (at, bif be the coordinate vector of it (a), with respect to a chosen or¬ 
thonormal basis of W. Prove that (a\, a 2 , ^ 3 ) and (b\, b 2 9 bs) are orthogonal unit vectors. 

*8. Let w G be a vector of length 1. 

(a) Prove that the matrix P = I — 2ww l is orthogonal. 

(b) Prove that multiplication by P is a reflection through the space W orthogonal to w, 
that is, prove that if we write an arbitrary vector v in the form v — cw + w\ where 
w f E W 1 , then Pv = ~cw + w\ 

(c) Let X 9 Y be arbitrary vectors in R n with the same length. Determine a vector w such 
that PX = F. 

*9. Use exercise 8 to prove that every orthogonal n 乂 n matrix is a product of at most n 
reflections. 

10* Let A be a real symmetric matrix, and let T be the linear operator on U n whose matrix 
is A. 

(a) Prove that (ker r) 丄 （im 7) and that V = (ker T)©(im 7), 

(b) Prove that T is an orthogonal projection onto im T if and only if ， in addition to being 

symmetric, A 2 = A. 

11. Let A be symmetric and positive definite. Prove that the maximal matrix entries are on 
the diagonal. 

4, Hermitian Forms 

1. Verify rules (4.4). 

2. Show that the dot product form (X * f) = X l Y is not positive definite on C' 

3. Prove that a matrix A is hermitian if and only if the associated form X^AX is a hermitian 

form. 

4. Prove that if X^AX is real for all complex vectors X, then A is hermitian. 

5. Prove that the nx n hermitian matrices form a real vector space, and find a basis for that 
space. 

6 . Let V be a two-dimensional hermitian space. Let (vi , v 2 ) be an orthonormal basis for V. 
Describe all orthonormal bases (v\ f , v 2 f ) with Vi = v x \ 

7. Let X, Y G C n be orthogonal vectors. Prove that \x + y\ 2 = \x\ 2 + \ y\ 2 . 

8 * Is {X, y) = X\yi + ix\y 2 — ix 2 y } + ix 2 yi on C 2 a hermitian form? 

9* Let A，B be positive definite hermitian matrices. Determine which of the following ma¬ 
trices are positive definite hermitian: A 2 , A 1 ， AB, A + B. 

10. Prove that the determinant of a hermitian matrix is a real number. 

11. Prove that A is positive definite hermitian if and only if A = P * 尸 for some invertible ma¬ 
trix P. 

12. Prove Theorem (4.19), that a hermitian form on a complex vector space V has an or¬ 
thonormal basis if and only if it is positive definite. 
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6. Prove the equivalence of conditions (a) and (b) of the Spectral Theorem. 

7. Prove that a real symmetric matrix A is positive definite if and only if its eigenvalues are 
positive. 

8 . Show that the only matrix which is both positive definite hermitian and unitary is the 
identity /. 

9. Let A be a real symmetric matrix. Prove that e A is symmetric and positive definite. 


13, Extend the criterion (1.26) for positive definiteness to hermitian matrices. 

14, State and prove an analogue of Sylvester’s Law for hermitian matrices. 

15, Let 〈，〉 be a hermitian form on a complex vector space V, and let {v, w] denote the real 
part of the complex number (v, w). Prove that if V is regarded as a real vector space ， 
then { ， } is a symmetric bilinear form on V, and if {,) is positive definite，then { ， } is too. 
What can you say about the imaginary part? 

16, Let P be the vector space of polynomials of degree < n. 

(a) Show that 

r2n 

f{e i6 )g{e ie )d6 
Jo 

is a positive definite hermitian form on 尸， 

(b) Find an orthonormal basis for this form. 

17, Determine whether or not the following rules define hermitian forms on the space C nXn 
of complex matrices，and if so, determine their signature. 

(a) a , B trace (A*fi) (b) A, B trace (AB) 

18, Let A be a unitary matrix. Prove that |detA| = 1. 

19, Let 尸 be a unitary matrix，and let X U X 2 be eigenvectors for P, with distinct eigenvalues 
Ai ， A 2 _ Prove that and X 2 are orthogonal with respect to the standard hermitian product 
on C' 

*20. Let A be any complex matrix. Prove that I + 4l is nonsingular. 

21. Prove Proposition (4.20). 

5 . The Spectral Theorem 

1. Prove that if 7 is a hermitian operator then the rule {v, w] = {v, Tw) - X^MY defines a 
second hermitian form on V. 

2. Prove that the eigenvalues of a real symmetric matrix are real numbers. 

3. Prove that eigenvectors associated to distinct eigenvalues of a hermitian matrix A are 
orthogonal. 

4. Find a unitary matrix P so that PAP^ is diagonal, when 

A — 

5. Find a real orthogonal matrix p so that PAP { is diagonal, when 
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10. Prove that for any square matrix A, ker A = (im A*) 丄 . 

*11. Let C — e l7Tl ’ n , and let A be the n x n matrix ajk = ^/Vn. Prove that A is unitary. 

12. Show that for every complex matrix A there is a unitary matrix P such that PAP^ is upper 
triangular* 

13, Let A be a hermitian matrix. Prove that there is a unitary matrix P with determinant 1 
such that PAP* is diagonal. 

*14. Let A,B be hermitian matrices which commute. Prove that there is a unitary matrix P 
such that PAP* and PBP* are both diagonal. 

15, Use the Spectral Theorem to give a new proof of the fact that a positive definite real 
symmetric n x n matrix P has the form P = AA l for some nX n matrix A. 

16. Let A ? fx be distinct eigenvalues of a complex symmetric matrix A ? and let X 7 Y be eigen¬ 
vectors associated to these eigenvalues. Prove that X is orthogonal to Y with respect to dot 
product* 


6. Conics and Quadrics 

1. Determine the type of the quadric x 2 +4xy+2xz+z 2 + 3 义 +z — 6 = 0. 

2. Suppose that (6.1) represents an ellipse. Instead of diagonalizing the form and then mak¬ 
ing a translation to reduce to the standard type, we could make the translation first. 
Show how to compute the required translation by calculus. 

3. Discuss all degenerate loci for conics. 

4. Give a necessary and sufficient condition, in terms of the coefficients of its equation, for 
a conic to be a circle, 

5. (a) Describe the types of conic in terms of the signature of the quadratic form. 

(b) Do the same for quadrics in U 3 . 

6. Describe the degenerate quadrics，that is，those which are not listed in (6.14), 


Z The Spectral Theorem for Normal Operators 

L Show that for any normal matrix A, ker A = (im A) 1 . 

2. Prove or disprove: If A is a normal matrix and W is an A-invariant subspace of V = C n 9 
then is also A-invariant. 

3. A matrix is skew-hermitian if A* = -A. What can you say about the eigenvalues and the 
possibility of diagonalizing such a matrix? 

4. Prove that the cyclic shift operator 

■ 

0 1 
0 1 
• 1 

1 • 0 

is normal，and determine its diagonalization. 

5. Let P be a real matrix which is normal and has real eigenvalues. Prove that P is 
symmetric, 

6 . Let p be a real skew-symmetric matrix. Prove that P is normal. 
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*7. Prove that the circulant 


Co 

Cl 

Cl 

… c n 

Cn 

* 

• 

Co 

C\ 

1 

• 

* 

• 

Cl 

C 2 

C3 

• 

… Co 


is a normal matrix. 

8. (a) Let A be a complex symmetric matrix. Prove that eigenvectors of A with distinct ei¬ 

genvalues are orthogonal with respect to the bilinear form X l X. 

*(b) Give an example of a complex symmetric matrix A such that there is no P E O n (C) 
with PAP 1 diagonal* 

9. Let A be a normal matrix. Prove that A is hermitian if and only if all eigenvalues of A are 
real，and that A is unitary if and only if every eigenvalue has absolute value L 

10. Let V be a finite-dimensional complex vector space with a positive definite hermitian 

form 〈，〉， and let T: V - > V be a linear operator on V. Let A be the matrix of T with 

respect to an orthonormal basis B. The adjoint operator 7 1 *: V - >V is defined as the 

operator whose matrix with respect to the same basis is A*. 

(a) Prove that T and 7 1 * are related by the equations (Tv,w) = (v.T^w) and 
(v, Tw) = (T w) for all v, w E W, Prove that the first of these equations charac¬ 
terizes T *. 

(b) Prove that T* does not depend on the choice of orthonormal basis- 

(c) Let v be an eigenvector for T with eigenvalue A, and \&t W — v 1 be the space of 
vectors orthogonal to v. Prove that W is r ^invariant. 

1L Prove that for any linear operator T, TT* is hermitian. 

12. Let V be a finite-dimensional complex vector space with a positive definite hermitian 
form 〈， 〉■ A linear operator T: V - >V is called normal if 7T* = T^T. 

(a) Prove that T is normal if and only if (Tv, Tw) = (T^v.T^w) for all t;, w E V, and 
verify that hermitian operators and unitary operators are normal. 

(b) Assume that 7 1 is a normal operator, and let v be an eigenvector for T, with eigen¬ 
value A. Prove that v is also an eigenvector for 7 1 *，and determine its eigenvalue. 

(c) Prove that if v is an eigenvector，then W = u 丄 is 7-invariant, and use this to prove 
the Spectral Theorem for normal operators. 

8. Skew-Symmetric Forms 

1. Prove or disprove; A matrix A is skew-symmetric if and only if X l AX — 0 for all X. 

2. Prove that a form is skew-symmetric if and only if its matrix has the properties (8.4). 

3. Prove or disprove: A skew-symmetric nx n matrix is singular if n is odd. 

4. Prove or disprove: The eigenvalues of a real skew-symmetric matrix are purely 
imaginary. 

*5. Let 5 be a real skew-symmetric matrix. Prove that / + 5 is invertible, and that 
(/ — s)(l + s)~ l is orthogonal. 

*6. Let A be a real skew-symmetric matrix. 

(a) Prove that det A > 0. 

(b) Prove that if A has integer entries, then det A is the square of an integer. 
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7. Let 〈，〉 be a skew-symmetric form on a vector space V, Define orthogonality, null space, 
and nondegenerate forms as in Section 2. 

(a) Prove that the form is nondegenerate if and only if its matrix with respect to any ba¬ 
sis is nonsingular. 

(b) Prove that if W is a subspace such that the restriction of the form to W is nondegen¬ 
erate, then V = 

(c) Prove that if the form is not identically zero, then there is a subspace W and a basis 

of W such that the restriction of the form to W has matrix ^ l. • 

_ l u 

(d) Prove Theorem (8.6). 


9. Summary of Results^ in Matrix Notation 

1. Determine the symmetry of the matrices AB + BA and AB — BA in the following cases* 
(a) A,B symmetric (b) A 7 B hermitian (c) A,B skew-symmetric (d) A symmetric ， 

B skew-symmetric 

2. State which of the following rules define operations of GL n (C) on the space C nXn of all 
complex matrices: 

尸， A/ww^/M〆 ， （尸 -i)t A ( 尸 - 1) ， (p~ i yAP\ P~ X AP\ AP\ P l A. 


3. (a) With each of the following types of matrices，describe the possible determinants: 

(i) real orthogonal (ii) complex orthogonal (iii) unitary (iv) hermitian 
(v) symplectic (vi) real symmetric, positive definite (vii) real symmetric, nega¬ 
tive definite 


(b) Which of these types of matrices have only real eigenvalues? 
4. (a) Let £ be an arbitrary complex matrix. Prove that the matrix 


(b) Find the inverse in block form 


A B 
C D 




-E 


is invertible. 


*5. (a) What is wrong with the following argument? Let P be a real orthogonal matrix. Let X 
be a (possibly complex) eigenvector of P ? with eigenvalue A. Then X x P x X = (PX)^ = 
On the other hand ， X^X = f ( 尸 _1 X) = A' Therefore A = and so 
A = ±1. 


(b) State and prove a correct theorem based on this argument. 

*6, Show how to describe any element of S0 4 in terms of rotations of two orthogonal planes 
in U 4 . 


*7. Let A be a real m x n matrix. Prove that there are orthogonal matrices P E. O m and 
Q G O n such that PAQ = D is diagonal, with nonnegative diagonal entries. 
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In these days the angel of topology and the devil of abstract algebra 
fight for the soul of every individual discipline of mathematics. 

Hermann Weyl 


l THE CLASSICAL LINEAR GROUPS 

Subgroups of the general linear group GL n are called linear groups. In this chapter 
we will study the most important ones: the orthogonal, unitary, and symplectic 
groups. They are called the classical groups• 

The classical groups arise as stabilizers for some natural operations of GL n on 
the space of n x n matrices. The first of these operations is that which describes 
change of basis in a bilinear form. The rule 

( 1 . 1 ) P y A^^>(pY l AP~ l 

is an operation of GL n on the set of all n x n matrices. This is true for any field of 
scalars, but we will be interested in the real and complex cases. As we have seen in 
Chapter 7 (1.15)，the orbit of a matrix A under this operation is the set of matrices A f 
which represent the form X x AY, but with respect to different bases. It is customary to 
call matrices in the same orbit congruent. We can set Q — (P 1 ) 1 to obtain the equiv¬ 
alent definition 

(1*2) A and A f are congruent if A f = QAQ 1 for some Q E GL m (F ). 

Sylvester’s Law [Chapter 7 (2.11)] describes the different orbits or congruence 
classes of real symmetric matrices. Every congruence class of real symmetric ma¬ 
trices contains exactly one matrix of the form Chapter 7 (2.10), The orthogonal 
group ， which we have defined before, is the stabilizer of the identity matrix for this 
operation- As before, we will denote the real orthogonal group by the symbol O n : 

(1.3) - {F E GL„(R) I P X P = /}. 
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The complex orthogonal group is defined analogously: 

O n (C) = {p E GL n (C) I P l P = /}. 

The stabilizer of the Lorentz form [Chapter 7 (2 J6)], defined by the matrix 

■ M 

1 

1 

, 3，1 = . 

1 ， 

"I 

is called the Lorentz group. It is denoted by 0 3 t i(IR) or O^j ： 

(1.4) 0 3r i ^ {P E GL n (U) I P t h,iP = / 3) i}. 

The linear operators represented by these matrices are often called Lorentz transfor¬ 
mations* The subscript (3,1) indicates the signature of the matrix, the number of 
+ Vs and - Vs. In this way an analogous group O p ^ can be defined for any signature 

The operation (1.1) also describes change of basis in forms which are not sym¬ 
metric. Thus Theorem (8.6) of Chapter 7 tells us this: 

(1.5) Corollary* There is exactly one congruence class of real nonsingular skew- 
symmetric mX m matrices, if m is even- □ 

The standard skew-symmetric form is defined by the 2n x 2n matrix J (Chapter 7 

(8.5) ), and its stabilizer is called the symplectic group 

(1.6) SP2n(U) = {P E. GL2n{U)\P l JP = j}. 

Again, the complex symplectic group 5P2n(C) is defined analogously. 

Finally, the unitary group is defined in terms of the operation 

(1.7) P, AP~ X . 

This definition makes sense only when the field of scalars is the complex field. Ex¬ 
actly as with bilinear forms, the orbit of a matrix A consists of the matrices which 
define the form (X, y) = X^AY with respect to different bases (see [Chapter 7 
(4.12)])* The unitary group is the stabilizer of the identity matrix for this action: 

(1.8) U n = {P\ P*P ~ /}• 

Thus U n is the group of matrices representing changes of basis which leaves the her- 
mitian dot product [Chapter 7 (4-2)] invariant. 

The word special is added to indicate the subgroup of matrices with determi¬ 
nant 1. This gives us some more groups: 

Special linear group SL n (U): nx n matrices P with determinant 1; 

Special orthogonal group 50^([R): the intersection *SZ^([R) fl 

Special unitary group SU n : the intersection *SZ^(C) Pi U n . 
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Though this is not obvious from the definition, symplectic matrices have determinant 
1， so the two uses of the letter S do not cause conflict. 


2. THE SPECIAL UNITARY GROUP SU 2 

The main object of this chapter is to describe the geometric properties of the classi¬ 
cal linear groups，by considering them as subsets of the spaces U nXn or C nXn of all 
matrices• We know the geometry of a few groups already. For example, GLi(C)= 
C x is the “punctured plane” C — {0}. Also, if /? is a 1 x 1 matrix, then p* = p_ 
Thus 


(2.1) Ui = {p E C x pp = 1}. 

This is the set of complex numbers of absolute value 1 — the unit circle in the com¬ 
plex plane. We can identify it with the unit circle in U 2 , 

x\ 2 + X 2 — 1 , 


by sending jci + x 2 /^ v ^(xi ? X 2 ). The group S0 2 of rotations of the plane is isomor¬ 
phic to U \. It is also a circle, embedded into U 2X2 by the map 


( 2 . 2 ) 


(X\,X 2 ) aam/ ^ 


Xi -X 2 
X 2 X\ 


We will describe some more of the groups in the following sections. 

The dimension of a linear group G is, roughly speaking, the number of degrees 
of freedom of a matrix in G. The group S0 2 , for example, has dimension 1. A ma¬ 
trix in S0 2 represents rotation by an angle 8, and this angle is the single parameter 
needed to determine the rotation. We will discuss dimension more carefully in Sec¬ 
tion 7, but we want to describe some of the low-dimensional groups explicitly first. 
The smallest dimension in which really interesting groups appear is 3, and three of 
these 一 SU 2 , S 03 , and SL 2 (U) — are very important. We will study the special uni¬ 
tary group SUi in this section. 

be an element of SU !， with a ， b ， c，d E £• The equations 
defining SU 2 are 尸 * 尸 =/ and det P = 1 . By Cramer’s Rule ， 

— d ~b 

-c a 


P 


(det py 


d -b 


c a 


Let P 


a b 
c d 


Since 尸一 1 = P* for a matrix in SU 2 , we find 
(2.3) a = d, and 


_ d-b 


a 

c 

一 c a 


b 

一 

d 

■ 


b = -c. 


or 


Thus 
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The condition dct P = 1 has become lost in the computation and must be put back: 

(2.5) aa + bb = 1. 

Equations (2.3) and (2.5) provide a complete list of conditions describing the entries 
of a matrix in SU 2 . The matrix P is described by the vector {a, b) E C 2 of length 1 ， 
and any such vector gives us a matrix P E SUi by the rule (2.4). 

If we write out a, b in terms of their real and imaginary parts, equation (2,5) 
gives us a bijective correspondence between SU 2 and points of W lying on the locus 

(2.6) Xi 2 + X 2 2 + JC3 2 + X4 2 = 1. 

This equation is equivalent to (2.5) if we set a = jci + x 2 i and b = x 3 + x 4 i. 

The locus (2.6) is called the unit Z-sphere in R 4 ，in analogy with the unit 
sphere in U 3 . The number 3 refers to its dimension, the number of degrees of free¬ 
dom of a point on the sphere. Thus the unit sphere 

X\ 2 + X2 + X 3 2 = 1 

in [R 3 , being a surfece，is called a 2-sphere. The unit circle in R 2 , a curve, is called a 
1 -sphere• We will sometimes denote a sphere of dimension d by S d . 

A bijective map/: S - >S f between subsets of Euclidean spaces is called a 

homeomorphism if/and/' 1 are continuous maps (Appendix, Section 3). The corre¬ 
spondence between SU 2 , considered as a subset of C 2x2 , and the sphere (2.6) is obvi¬ 
ously continuous, as is its inverse. Therefore these two spaces are homeomorphic, 

(2*7) SU 2 is homeomorphic to the unit 3-sphere in R 4 . 

It is convenient to identify SU 2 with the 3-sphere. We can do this if we repre¬ 
sent the matrix (2.4) by its top row，the vector (a ， b) E C 2 ，or by the vector 
(x\ y X 2 ,X 3 ,x 4 ) E U 4 . These representations can be thought of as different notations 
for the same element P of the group, and we will pass informally from one represen¬ 
tation to the other. For geometric visualization，the representations P = (a, b) and 
P = (xuX 2 ,X 3 ,x 4 ), being in lower-dimensional spaces，are more convenient. 

The fact that the 3-sphere has a group structure is remarkable, because there is 
no way to make the 2-sphere into a group with a continuous law of composition. In 
feet ， a famous theorem of topology asserts that the only spheres with continuous 
group laws are the 1-sphere，which is realized as the rotation group S0 2 , and the 
3-sphere SU 2 . 

We will now describe the algebraic structures on SU 2 analogous to the curves 
of constant latitude and longitude on the 2-sphere. The matrices /， - / will play the 
roles of the north and south poles. In our vector notation, they are the points 
(士 1 ， 0,0, 0) of the sphere. 

If the poles of the 2-sphere jc ? 十 jc? + = 1 are placed at the points 

(士 1 ， 0,0)，then the latitudes are the circles x x = c, -l < c < l. The analogues on 
the 3-sphere SUi of these latitudes are the surfaces on which the coordinate is 
constant. They are two-dimensional spheres，embedded into W by 

(2.8) X\ = c and 又 2 2 + 又 3 2 十 又 4 2 = (1 — c 2 )， -1 < c < 1. 
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These sets can be described algebraically as conjugacy classes in Slh* 

(2.9) Proposition. Except for two special classes，the conjugacy classes in SU 2 are 
the latitudes ， the sets defined by the equations (2.8). For a given c in the interval 
(-1,1)，this set consists of all matrices P E SU 2 such that traced = 2c. The remain¬ 
ing conjugacy classes are {/} and {-/}，each consisting of one element. These two 
classes make up the center Z = {±/} of the group SU 2 . 

Proof. The characteristic polynomial of the matrix P (2,4) is 

(2.10) / 2 — (fli + + 1 — t 2 ~ 2xi/ + 1. 

This polynomial has a pair A, A of complex conjugate roots on the unit circle, and 
the roots，the eigenvalues of P, depend only on traceP = 2x\. Furthermore, two ma¬ 
trices with different traces have different eigenvalues. The proposition will follow if 
we show that the conjugacy class of P contains every matrix in SU 2 with the same 
eigenvalues. The cases x\ = 1,-1 correspond to the two special conjugacy classes 
{/}，{—/}，so the proof is completed by the next lemma. 

(2.11) Lemma* Let P be an element of SU 2 , with eigenvalues A, A. Then P is con¬ 
jugate in SUi to the matrix 

■ — 

A 

„ A_* 

Proof. By the Spectral Theorem for normal operators [Chapter 7 (7-3)], there 
is a unitary matrix Q so that QPQ^ is diagonal. We only have to show that Q can be 
chosen so as to have determinant 1. Say that det 0 Since Q^Q = /, 
(det g*)(det Q) = 88 = 1; hence 8 has absolute value 1. Let e be a square root of 5. 
Thenee = 1 too. The matrix Qi = eQ is in SU 2 , and P\ = Q\PQ\^ is also diagonal. 
The diagonal entries of Pi are the eigenvalues A，[ The eigenvalues can be inter¬ 
changed, if desired, by conjugating by the matrix 

— — 

(2.12) 02 = 1 ， 

which is also an element of SU 2 . □ 

Next we will introduce the longitudes of Slh. The longitudes on the 2-sphere 
Xi 2 + xi + x 3 2 = l can be described as intersections of the sphere with planes con¬ 
taining the two poles (±1,0,0). When we add a fourth variable x 4 to get the equa¬ 
tion of the 3-sphere, a natural way to extend this definition is to form the intersec¬ 
tion with a two-dimensional subspace of U 4 containing the two poles 土厂 This is a 
circle in SU 2 , and we will think of these circles as the longitudes. Thus while the lat¬ 
itudes on SU 2 are 2-spheres, the longitudes are 1-spheres, the “great circles” through 
the poles. 

Note that every point P = 0^ ，， jc 3 ， x 4 ) of Slh except for the poles is con¬ 
tained in exactly one longitude. This is because if P is not a pole, then P and I will 
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be linearly independent and thus will span a subspace V of U 4 of dimension 2. The 
intersection SUi H V is the unique longitude containing P. 

The intersection of SU 2 with the plane W defined by x 3 = jc 4 = 0 is a particu¬ 
larly nice longitude. In matrix notation, this great circle consists of the diagonal ma¬ 
trices in SU2 , which form a subgroup T: 


(2.13) 



The other longitudes are described in the following proposition. 


(2.14) Proposition. The longitudes of SU 2 are the conjugate subgroups QTQ^ of 
the subgroup T. 


-1 


™i 





SU 2 


so 2 


Diagonal 

matrices 


Trace-zero 

matrices 


(2.15) Figure. Some latitudes and longitudes in Slh. 


In Figure (2.15) the 3-sphere SU 2 is projected from U 4 onto the unit disc in the 
plane. The conjugacy class shown is the “equatorial” latitude in U 4 , which is defined 
by the equation jci = 0. Just as the orthogonal projection of a circle from U 3 to U 2 is 
an ellipse, the projection of this 2-sphere from R 4 to W is an ellipsoid，and the fur¬ 
ther projection of this ellipsoid to the plane is the elliptical disc shown. 

Proof of Proposition (2J4). The point here is to show that any conjugate sub¬ 
group QTq^ is a longitude. Lemma (2.11) tells us that every element P E SU 2 lies in 
one of these conjugate subgroups (though the roles of Q and Q* have been reversed) * 
Since every P 土 / is contained in exactly one longitude, it will follow that every 
longitude is one of the subgroups QTQ^. 

So let us show that a conjugate subgroup QTQ^ is a longitude* The reason this 
is true is that conjugation by a fixed element 0 is a linear operator which sends the 
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subspace W to another subspace. We will compute the conjugate explicitly to make 
this clear. Say that Q is the matrix (2.4). Let w = (w] ， 0, 0) denote a variable el¬ 
ement of W, and set z — w\ -\- w 2 i^ Then 


a b 


z 


a ~b 


aaz+bbl ab (z—z) 

-b 


_ z. 


b a_ 


氺 氺 


Computing these entries，we find that w is sent to the vector u = {u x , m 2 , m 4 ), 

where 

U\ = W\ y Ui = (X\ 2 -\-X2 2 —X3 2 —X4 2 )w 2 j 

U 3 = 2(X\X4-\-X 2 X3}W 2 , u 4 = 2( 义 2 又 4 _ 又 1 又 3)>^2. 

The coordinates w are real linear combinations of (w { ? w 2 ). This shows that the map 
w i s a real linear transformation. So its image V is a subspace of (R 4 _ The con¬ 
jugate group 2312* is St /2 A V. Since QTQ^ contains the poles ±/，so does V y and 
this shows that QTQ^ is a longitude .口 

We will describe another geometric configuration briefly; As we have seen，the 
subgroup T of diagonal matrices is a great circle in the 3-sphere SU 2 . The left cosets 
of this subgroup, the sets of the form QT for Q E SU 2 , are also great circles，and 
they partition the group SU 2 . Thus the 3-sphere is partitioned into great circles. This 
very interesting configuration is called the Hopf fibration. 


3. THE ORTHOGONAL REPRESENTATION OF SU 2 

We saw in the last section that the conjugacy classes in the special unitary group SU 2 
are two-dimensional spheres. Since conjugacy classes are orbits for the operation of 
conjugation ， SU2 operates on these spheres. In this section we will show that conju¬ 
gation by an element P E SU2 acts on each of the spheres as a rotation ，and that the 
map sending P to the matrix of this rotation defines a surjective homomorphism 

(3.1) (p: SU 2 ― >S 0 3 , 

whose kernel is the center Z = {± 1 } of Slh. This homomorphism is called the 
orthogonal representation of Slh- It represents a complex 2x2 matrix P in SU2 by a 
real 3x3 rotation matrix (p(P). 

The safest way to show that P operates by rotating a conjugacy class may be to 
write the matrix representing the rotation down explicitly. This is done in (3.12). 
However，the formula for ip (P) is complicated and not particularly enlightening. It is 
better to describe (p indirectly，as we will do presently. Let us discuss the geometry 
of the map first. 

Since the kernel of <p is {±/}，its cosets are the sets { 士尸 }- They form the fibres 
of the homomorphism. Thus every element of SO3 corresponds to a pair of unitary 
matrices which differ by sign. Because of this，the group SU 2 is called a double cov¬ 
ering of the group SO 3 . 
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The map fju: S0 2 - > S0 2 of the 1^sphere to itself defined by pd /sM/ ^pie is an¬ 

other example of a double covering. Its kernel also consists of two elements, the 
identity and the rotation by 77 . Every fibre of contains two rotations pe and 



The orthogonal representation can be used to identify the topological struc¬ 
ture of the rotation group. In vector notation, if P = then -P = 

(~Xi ， … ， -x 4 ) 9 and the point -P is called the antipode of P, So since points of the 
rotation group correspond to cosets {±P} ? the group SO 3 can be obtained by identify¬ 
ing antipodal points on the 3-sphere SU 2 . The space obtained in this way is called 
the real projective 3-space: 

(3-3) SO3 is homeomorphic to the real projective 3-space. 

The number 3 refers again to the dimension of the space. Points of the real projec¬ 
tive 3-space are also in bijective correspondence with lines through the origin (or 
one-dimensional subspaces) of U 4 . Every line through the origin meets the unit 
sphere in a pair of antipodal points * 

As we noted in Section 8 of Chapter 4, every element of SO3 except the iden¬ 
tity can be described in terms of a pair (t; ， 0 )，where t; is a unit vector in the axis of 
rotation and where 0 is the angle of rotation. However, the two pairs {v,6) and 
(^v y -6) represent the same rotation. The choice of one of these pairs is referred to 
by physicists as the choice of a spin. It is not possible to make a choice of spin which 
varies continuously over the whole group. Instead, the two possible choices define a 
double covering of S0 3 — {/}* We may realize the set of all pairs (u, 6) as the 
product space 5 X 0 ，where S is the 2-sphere of unit vectors in U 3 , and where 0 is 
the set of nonzero angles 0 < 6 < 2tt. This product space maps to SO 3 ： 

(34) if/: S x Q — >S0 3 - {/}, 

by sending (v,6) to the rotation about v through the angle 6. The map ^ is a double 
covering of S0 3 - {/} because every nontrivial rotation is associated to two pairs 

(u ， 汐 ), - 6 ). 

We now have two double coverings of SO3 — {/}, namely 5 x 0 and also 
SU 2 — { 土/}， and it is plausible that they are equivalent. This is true: 


(3,5) Proposition. There is a homeomorphism h: (SU 2 - { ± /}) - >5x0 which 

is compatible with the maps SO 3 , i.e., such that if/ 0 h = (p. 
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This map h is not a group homomorphism. In fact, neither its domain nor its range is 
a group. 

Proposition (3.5) is not very difficult to prove, but the proof is slightly elusive 
because there are two such homeomorphisms. They differ by a switch of the spin. 
On the other hand, the fact that this homeomorphism exists follows from a general 
theorem of topology，because the space SU 2 — {±/} is simply connected, (A simply 
connected space is one which is path connected and such that every loop in the space 
can be contracted continuously to a point.) It is better to leave this proof to the 
topologists. □ 


Therefore every element of SU 2 except ±7 can be described as a rotation of U 3 
together with a choice of spin. Because of this ， SU 2 is often called the Spin group. 

We now proceed to compute the homomorphism (p, and to begin, we must se¬ 
lect a conjugacy class. It is convenient to choose the one consisting of the trace-zero 
matrices in SU 2 , which is the one defined by X\ = 0 and which is illustrated in Fig¬ 
ure (2.15). The group operates in the same way on the other classes. Let us call the 
conjugacy class of trace-zero matrices C. An element A of C will be a matrix of the 
form 

(3.6) 

where 



(3.7) 


j2 2 +y3 2 +y4 2 = 1. 


Notice that this matrix is skew-hermitian ，that is，it has the property 

(3.8) = —A. 


(We haven’t run across skew-hermitian matrices before, but they aren’t very differ* 
ent from hermitian matrices. In fact, A is a skew-hermitian matrix if and only if 
H = iA is hermitian.) The 2x2 skew-hermitian matrices with trace zero form a real 
vector space V of dimension 3, with basis 


(3.9) 



In the notation of (3.6), A = BY, where Y = (} ； 2 ， } ； 3 ， y 4 )、 So the basis B corresponds 
to the standard basis (e 2 , e 3 ,e 4 ) in the space R 3 , and (3.7) tells us that our conjugacy 
class is represented as the unit sphere in this space. 

Note that SU 2 operates by conjugation on the whole space V of trace-zero, 
skew-hermitian matrices, not only on its unit sphere: If A E V, P E SU 2 , and if 
B = PAP^ = PAP —，then trace B = 0, and = (户 AP*)* = PA^P^ = (/ > (-A)P* = 
—B, Also, conjugation by a fixed matrix P gives a linear operator on V, because 
p(A + A f )P^ = PAP* + PA f P^, and if r is a real number, then = rPAP*. 

The matrix of this linear operator is defined to be To determine the matrix ex- 
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plicitly，we conjugate the basis (3.7) by P and rewrite the result in terms of the basis. 
For example, 


(3.10) 


a b 


• 

1 


~a -b 

* 

aa—bb 

-lab 

_-b a_ 


_ 

-1 


b a 
—* 一 

— i 

-lab 

bb—aa_ 


The coordinates of this matrix are yi — da — bb, = i{~ab + ab), and y 4 - 
一 (ab + ab). They form the first column of the matrix <p (P) , Similar computation for 
the other columns yields 


(3.11) 


(aa—bb) i(ab—ab) (ab+ab) 

i(ab—ab) \(a 2 +a 2 +b 2 +b 2 ) {(a 2 —a 2 —b 2j rb 2 ) 

-(ab+ab) {(a 2 -a 2 +b 2 —b 2 ) \{a 2j ra 2 —b 2 —b 2 ) 


We will not make use of the above computation. Even without it, we know 
that ip (P) is a real 3x3 matrix because it is the matrix of a linear operator on a real 
vector space V of dimension 3. 


(3*12) Lemma. The map P (P) defines a homomorphism SU 2 - > GL 3 (IR). 

Proof. It follows from the associative law [Chapter 5 (5.1)] for the operation 
of conjugation that <p is compatible with multiplication: The operation of a product 
PQ on a matrix A is (P<2 )a(p<2)* = PiQAQ^P^. This is the composition of the opera¬ 
tions of conjugation by P and by Q, Since the matrix of the composition of linear op¬ 
erators is the product matrix, (p(PQ) — <p(P)(p(Q). Being compatible with multipli¬ 
cation, <p(P~ l )(p(P) = (pih) = / 3 . Therefore (p(P) is invertible for every P, and so (p 
is a homomorphism from SU 2 to GL 3 ([R), as asserted, □ 


(3.13) Lemma. For any P, <p(p) E S0 3 . Hence p^^(p(p) defines a homomor¬ 
phism SU 2 — >so 3 . 

Proof. One could prove this lemma using Formula (3.11). To prove it concep¬ 
tually, we note that dot product on U 3 carries over to a bilinear form on V with a 
nice expression in terms of the matrices. Using the notation of (3.6 )， we define 
(A,A f ) = yiy/ + y 2 y 2 f + y 3 y 3 '. Then 

(3.14) 〈 A ， A '〉 = -|trace(AA’). 

This is proved by computation; 

^ ^ (yiyi ' + y3y3 r f ) * 

L * -(yiyi +y3y^)y 

and so trAA f — -2{A,A f ). 

This expression for dot product shows that it is preserved by conjugation by an 
element P E SU 2 ： 


(PAP^,PA f P^) = 一士 trace (PAP* 凡 4 7>*) = trace (AA r ) = <A ， A). 
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Or，in terms of the coordinate vectors ， (<p(p)Y * <p(p)Y f ) = (Y*Y r ). It follows that 
q>(P) lies in the orthogonal group 0 3 = 0 3 (U) [Chapter 4 (5,13)]. 

To complete the proof, let us verify that (p(P) has determinant 1 for every 
P E SU 2 : Being a sphere, SU 2 is path connected. So only one of the two possible 
values 土 1 can be taken on by the continuous function det <p(P). Since <p(h) = h 
and det / 3 = 1， the value is always +1 ， and <p{p) E. as required. □ 

(3.15) Lemma, ker <p — {±/}. 

Proof. The kernel of <p consists of the matrices P E SU 2 which act trivially on 
V, meaning that PAP* = A for all skew-hermitian matrices of trace zero. Suppose 
that 尸 has the property PAP^ = A, or PA = AP ， for all 尸 G K We test it on the basis 
(3,7), The test leads to b = 0, a = a, which gives the two possibilities P = ±7, and 
they are in the kernel. So ker (p = {±/}, as claimed. □ 


(3.16) Lemma. The image of the map <p is SO3. 


Proof. We first compute (p(P) explicitly on the subgroup T of diagonal ma¬ 
trices in Slh. Let z = y 3 + Then 


(3.17) 


PAP^ = 




So <p(P) fixes the first coordinate y 2 and it multiplies zby a 2 . Since |a| = 1, we may 
write a = e 10 . Multiplication by a 2 = e 2ld defines a rotation by IB of the complex 
z-plane. Therefore 

1 0 0 

(3.18) <p(P) = 0 cos 26 -sin W 

0 sin IB cos 29 



This shows that the image of <p in S0 3 contains the subgroup H of all rotations about 
the point (1,0, 0)\ This point corresponds to the matrix E = • ， Since the unit 

一 i 

■ ■ 

sphere C is a conjugacy class，the operation of SU 2 is transitive. So if Y is any unit 
vector in K 3 , there is an element Q E SU 2 such that #(2)(1 ， 0,0) 1 = y，or in matrix 
notation, such that QEQ^ = A. The conjugate subgroup <p(Q)H(p(Q) ：¥ of rotations 
about Y is also in the image of <p. Since every element of S0 3 is a rotation, <p is sur¬ 
jective, □ 


The cosets making up the Hopf fibration which was mentioned at the end of 
last section, are the fibres of a continuous surjective map 

(3.19) 77 ： S 3 — >5 2 

from the 3-sphere to the 2-sphere. To define 77, we interpret 5 3 as the special uni¬ 
tary group SU !，and S 2 as the conjugacy class C of trace-zero matrices, as above. We 
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set E 


， and we define tt (p) = PEP^, for P E SU 2 ‘ The proof of the follow- 


ing proposition is left as an exercise. 


(3.20) Proposition. The fibres of the map tt are the left cosets QT of the group T 
of diagonal matrices in Slh^ n 


4L THE SPECIAL LINEAR GROUP SL2(U) 


Since the special unitary group is a sphere, it is a compact set. As an example of a 
noncompact group, we will describe the special linear group SLz(U). To simplify no¬ 
tation, we denote 5L 2 (1R) by SL 2 in this section. 

Invertible 2x2 matrices operate by left multiplication on the space U 2 of 
column vectors，and we can look at the associated action on rays in U 2 . A ray is a 
half line R = {rX | r ^ 0}. The set of rays is in bijective correspondence with the 
points on the unit circle S 1 3 the ray R corresponding to the point /? Pi 5 1 • 

Our group SL 2 operates by left multiplication on the set of rays. Let us denote 
by H the stabilizer of the ray R\ = {re\} in SL 2 (U). It consists of matrices 



B 


a b 
0 a 


-\ 


where a is positive and b is arbitrary. 

The rotation group S0 2 is another subgroup of SL 2 , and it operates transitively 
on the set of rays. 


(4.2) Proposition. The map /: SO 2 x H - >SL 2 defined by f(Q ， B) = <2fi is a 

homeomorphism (but not a group homomorphism). 

Proof• Notice that// Pi = {/}• Therefore / is injective [Chapter 2 (8.6)]. 
To prove surjectivity of /, let P be an arbitrary element of SL 2 , and let R\ be the ray 
{re x \r > 0}. Choose a rotation Q E S0 2 such that PR\ = QR\. Then Q~ l P is in the 
stabilizer H, say Q~ X P — B, or 

(4.3) P = QB. 

Since/is defined by matrix multiplication, it is a continuous map. Also, in the con¬ 
struction of the inverse map，the rotation Q depends continuously on P because the 
ray PR\ does. Then B = Q \P also is a continuous function of 尸 ， and this shows that 
f~ { is continuous as well, □ 

Note that H can be identified by the rule B < ~~> {a, b) with the product space 
{positive reals) x U. And the space of positive reals is homeomorphic by the log 
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function to the space 1R of all real numbers. Thus H is homeomorphic to U 2 . Since 
SOi is a circle, we find that 

(4_4) 5L 2 ([R) is homeomorphic to the product space S 1 X [R 2 . 

The special linear group can be related to the Lorentz group Oia of two- 
dimensional space-time by a method analogous to that used in Section 3 for the or¬ 
thogonal representation of SU 2 . Let the coordinates in U 3 be with the 

Lorentz form 


(4.5) 


y\y\ f + yiyi f - tt\ 


and let W be the space of real trace-zero matrices. Using the basis 


(4.6) 、， 2 , 

■ -vJ L— — 

we associate to a coordinate vector 


L-i 一 

the matrix 


(4,7) 


A 


y\ 



yi+t 

-yi. 


We use this representation of trace-zero matrices because the Lorentz form (4*5) has 
a simple matrix interpretation on such matrices : 

(4.8) 〈 A〆' 〉 = y\yi + y 2 j 2 - tt f = \ trace(AA’). 

The group SLi acts on W by conjugation ， 

(4.9) P, 

and this action preserves the Lorentz form on W, because 

trace (MO = trsice((PAP~ l )(PA f P~ 1 )), 

as in the previous section. Since conjugation is a linear operator on W, it defines a 

homomorphism <p: SL 2 - >GL 3 (U). Since conjugation preserves the Lorentz form, 

the image (p (P) of P is an element of 


(4,10) Theorem^ The kernel of the homomorphism <p is the subgroup { 土 /}， and 
the image is the path-connected component 0 2 j° of 0 2 ,\ containing the identity /. 
Therefore 0 2 j° — 5L 2 ([R)/{±/}. 

It can be shown that the two-dimensional Lorentz group has four path-connected 
components. 

The fact that the kernel of (p is {±1} is easy to check，and the last assertion of 
the theorem follows from the others. We omit the proof that the image of <p is the 
subgroup O 2 J 0 . □ 



Section 5 One-Parameter Subgroups 


283 


5. ONE-PARAMETER SUBGROUPS 


In Chapter 4, we defined the exponential of a matrix by the series 

(5.1) e A = / + (l/l !)A + ( 1 / 2! )A 2 + (1/3!)A 3 + .... 

We will now use this function to describe the homomorphisms from the additive 
group of real numbers to the general linear group，which are differentiable functions 
of the variable ? E [R. Such a homomorphism is called a one-parameter subgroup of 
GL n . (Actually, this use of the phrase “one，parameter subgroup” to describe such 
homomorphisms is a misnomer. The image of <p should be called the subgroup,) 


(5.2) Proposition. 

(a) Let A be an arbitrary real or complex matrix, and let GL n denote GL n (U) or 

GL n (C), according to the case. The map <p: - >GL n defined by (p{t )= 

e tA is a group homomorphism. 

(b) Conversely, let (p: - ^ GL n be a homomorphism which is a differentiable 

function of the variable t E [R + , and let A denote its derivative (p f (0) at the 
origin. Then <p(t) = e tA for all t. 

Proof. For any two real numbers n the two matrices rA and sA commute. 
So Chapter 4 (7.13) tells us that 

(5.3) 


This shows that (p(t) = e tA is a. homomorphism. Conversely, let <p be a different 
tiable homomorphism [R + - >GL n . The assumption that cp is a homomorphism al¬ 

lows us to compute its derivative at any point. Namely, it tells us that (p(t + A/)= 
<p(At)<p(t) and <p(t) = (p(0)(p(t). Thus 


(5.4) 


(p{t + At) - (p{t) _ <p{At) - <p(0) 


At 


At 


<p{t) 


Letting At ->0， we find (p f (t) = = A(p(t). Therefore <p{t) is a matrix- 

valued function which solves the differential equation 


(5.5) 


dip 


dt 


= A(p. 


The function e tA is another solution，and both solutions take the value / at f = 0, It 
follows that (p(t) — e tA [see Chapter 4 (8.14)], □ 

By the proposition we have just proved, the one-parameter subgroups all have 
the form <p(t) = They are in bijective correspondence with nXn matrices. 
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GL^Q. 


Now suppose that a subgroup of G of GL n is given. We may ask for one- 
parameter subgroups of G, meaning homomorphisms <p: or, equivalently, 

homomorphisms to GL n whose image is in G. Since a one-parameter subgroup of 
GL n is determined by a matrix, this amounts to asking for the matrices A such that 
e tA E G for all t. It turns out that linear groups of positive dimension always have 
one-parameter subgroups and that they are not hard to determine for a particular 
group. 

(5.7) Examples. 

(a) The usual parametrization of the unit circle in the complex plane is a one- 
parameter subgroup of U\ : 

t^^e n = cos t + i sin t , 

(b) A related example is obtained for S0 2 by setting 

cos t -sin t 
sin t cos d 

This is the standard parametrization of the rotation matrices. 

In examples (a) and (b), the image of the homomorphism is the whole subgroup. 

(c) Let A be the 2 x 2 matrix unit e\ 2 . Then since A 2 = 0, all but two terms of the 
series expansion for the exponential vanish, and 

m i ri ti 

e tA = I + e\ 2 t = 1 • 

■ 一 

In this case the exponential map defines an isomorphism from IR + to its image ， 
which is the group of triangular matrices with diagonal entries equal to 1. 


A 


0-1 
1 0 


• Then e tA 
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(d) The one-parameter subgroups of SU 2 are the conjugates of the group of diagonal 
special unitary matrices，the longitudes described in (2,13). n 

Instead of attempting to state a general theorem describing one-parameter sub¬ 
groups of a group, we will determine them for the orthogonal and special linear 
groups as examples of the methods used. We will need to know that the exponential 
function on matrices has an inverse function. 

(5.8) Proposition. The matrix exponential maps a small neighborhood 5 of 0 in 
U nXn homeomorphically to a neighborhood T of I. 

Proof. This proposition follows from the Inverse Function Theorem, which 

states that a differentiable function/: U k - > U k has an inverse function at a point p 

if the Jacobian matrix (dfi/dxj)(p) is invertible. We must check this for the matrix 
exponential at the zero matrix in U nXn . This is a notationally unpleasant but easy 
computation. Let us denote a variable matrix by X. The Jacobian matrix is the 
n 2 x n 2 matrix whose entries are (d(e x ) a ( 3 /dXij)\ x =o, We use the fact that 
d/dt — A. It follows directly from the definition of the partial derivative that 

(de x /dXij)\x^o = (de teiJ /dt)\ t =o = eij. Therefore 0(e x ) a /3/dX ( y)|^=o = 0 if a，/3 4 i，j 
and (d(e x )ij/dXij)\x=o = 1, The Jacobian matrix is the n 2 x n 2 identity matrix- □ 

We will now describe one-parameter subgroups of the orthogonal group O n . 
Here we are asking for the matrices A such that is orthogonal for all r 

(5.9) Lemma. If A is skew-symmetric, then e A is orthogonal. Conversely, there is 
a neighborhood S f of 0 in U nXn such that if 〆 is orthogonal and A E S\ then A is 
skew-symmetric. 

Proof. To avoid confusing the variable t with the symbol for the transpose ma¬ 
trix, we denote the transpose of the matrix A by A* here. If A is skew-symmetric, 
then e {M) = e^ A . The relation e (A * } = (e A )^ is clear from the definition of the expo¬ 
nential, and e~ A = (e A )~ l by Chapter 4 (8.10). Thus (〆)* = e (A * ] = e~ A = (e A )~\ 
This shows that is orthogonal. For the converse，we choose S f small enough so 
that if A E ： S f , then -A and A* are in the neighborhood S of Proposition (5.8) - Sup¬ 
pose that A E 5' and that e A is orthogonal. Then e {M) = e~ A 9 and by Proposition 
(5.8)，this means that A is skew-symmetric. □ 

(5.10) Corollary. The one-parameter subgroups of the orthogonal group O n are 
the homomorphisms t^^e tA , where A is a real skew-symmetric matrix. 

Proof. If A is skew-symmetric, tA is skew-symmetric for all t. So e tA is or¬ 
thogonal for all t, which means that is a one-parameter subgroup of O n ^ Con¬ 
versely, suppose that is orthogonal for all t. For sufficiently small e, eA is in 
the neighborhood S f of the lemma, and is orthogonal. Therefore eA is skew- 
symmetric ,and this implies that A is skew-symmetric too. □ 

This corollary is illustrated by Example (5.7b), 
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Next，let us consider the special linear group 5L n (IR) # 

(5.11) Proposition. Let A be a matrix whose trace is zero. Then e A has determi¬ 
nant 1. Conversely, there is a neighborhood S f of 0 in U nXn such that if A E and 
det e A = l, then trace A = 0. 

Proof. The first assertion follows from the pretty formula 

(5.12) e trA = det e A , 

where trA denotes the trace of the matrix. This formula follows in turn from the fact 
that if the eigenvalues of a complex matrix A are Ai ， … ， A n ，then the eigenvalues of 
e A are e An . We leave the proof of this fact as an exercise. Using it, we find 

e uA — e Al+ " ,+A/I = e kn = det e A . 

For the converse, we note that if |x| < 1 ? = 1 implies x = 0. We choose S f 

small enough so that tr A < 1 if A E SThen if det e A = e txA = 1 and if A E S f ? 
tr A = 0. □ 

(5.13) Corollary. The one-parameter subgroups of the special linear group 

SL n (U) are the homomorphisms where A is a real nXn matrix whose 

trace is zero. □ 

The simplest one-parameter subgroup of SL 2 (U) is described in Example 
(5Jc). 


6. THE UE ALGEBRA 

As always，we think of a linear group G as a subset of U nXn or of C nXn . The space of 
vectors tangent to G at the identity matrix /， which we will describe in this section, 
is called the Lie algebra of the group. 

We will begin by reviewing the definition of tangent vector. If <p(t)= 

is a differentiable path in U k , its velocity vector t; = <p 9 (t) is tangent 
to the path at the point x — <p(t). This is the basic observation from which the 
definition of tangent vector is derived. 

Suppose that we are given a subset S of U k . A vector v is said to be tangent to 
5 at a point x if there is a differentiable path <p(t) lying entirely in S, such that 
<^0) = jc and 屮乂 0) = 

If our subset S is the locus of zeros of one or more polynomial functions 
f(xi ，•_.，;《：)，it is called a real algebraic set: 

(6.1) 5* = {x I /(x) = 0}. 

For example, the unit circle in 1R 2 is a real algebraic set because it is the locus of ze¬ 
ros of the polynomial f(x \, x 2 ) = x\ 2 + jc 2 2 — 1 = 0. 

The chain rule for differentiation provides a necessary condition for a vector to 
be tangent to a real algebraic set S. Let <p(t) be a path in 5, and let x = (t) and 
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v = Since the path is in S, the functions f(<p(t)) vanish identically; hence 

their derivatives also vanish identically: 


( 6 . 2 ) 


where ▽/ 


0 


df 


dt 


f(<P 0 )) 


… + H Vk = _. v 、， 


—IL 

dx\ ， … ， 8xl 


is the gradient vector. 


(6.3) Corollary • Let 5 be a real algebraic set in U k , defined as the locus of zeros 
of one or more polynomial functions/( jc) • The tangent vectors to S at jc are orthogo¬ 
nal to the gradients V/(x). □ 


For instance, if S is the unit circle and x is the point (1,0)，then the gradient 
vector V/(0) is (2,0). Corollary (6.3) tells us that tangent vectors at (1,0) have the 
form (0, c)，that is, that they are vertical, which is as it should be. 

Computing tangent vectors by means of parametrized paths is clumsy because 
there are many paths with the same tangent. If we are interested only in the tangent 
vector，then we can throw out all of the information contained in a path except for 
the first-order term of its Taylor expansion. To do this systematically, we introduce a 
formal infinitesimal element e. This means that we work algebraically with the rule 

(6.4) e 2 = 0. 

Just as with complex numbers, where the rule is / 2 = -1， we can use this rule to 
define a multiplication on the vector space 

E = {a + be \ a,b G R} 

of formal linear combinations of (1 ， e) with real coefficients. The rule for multiplica¬ 
tion is 

(6.5) (a + be)(c + de) = ac + (be + ad)e. 

In other words，we expand formally，using the relations ec = ce for all c E fS and 
e 2 = 0. As with complex numbers, addition is vector addition: 

(a + be) + (c + de) = (a + c) + (/? + d)€. 

The main difference between C and E is that E is not a field, because e has no multi¬ 
plicative inverse. [It is a ring (see Chapter 10).] 

Given a point x of U k and a vector v E U k , the sum jc + is a vector with 
entries in E which we interpret intuitively as an infinitesimal change in x ， in the di¬ 
rection of v. Notice that we can evaluate a polynomial /(jc) = f(x\ ， … ，及 ) at jc + ue 
using Taylor’s expansion. Since e 2 = 0, the terms of degree >2 in e drop out，and 
we are left with an element of E: 

(6.6) f(x+ve) ^ f(x) + + e = /( 又 ） + (V/W • 咖 - 



288 


Linear Groups Chapter 8 


Working with rule (6.4) amounts to ignoring the higher-order terms in e. Thus the 
dot product (V/(jc) - v) represents the infinitesimal change in / which results when 
we make an infinitesimal change in x in the direction of v. 

Going back to a real algebraic set S defined by the polynomial equations 
/(jc) = 0, let jc be a point of S. Then/(x) = 0, so (6.6) tells us that 

(6.7) f(x + ve) = 0 if and only if (V/(x) • u) = 0 ， 

which is the same as the condition we obtained in Corollary (6.3). This suggests the 
following definition: Let 5 be a real algebraic set, defined by the polynomial equa¬ 
tions /(jc) = 0. A vector i; is called an infinitesimal tangent to 5 at jc if 

(6.8) f(x + v€) = 0. 

(6.9) Corollary • Let x be a point of a real algebraic set S. Every tangent to 5 at x 
is an infinitesimal tangent. □ 

Notice that if we fix x E S, the equations (V/(x) - t;) = 0 are linear and ho¬ 
mogeneous in u. So the infinitesimal tangent vectors to 5 at x form a subspace of the 
space of all vectors. 

Actually，our terminology is slightly ambiguous. The definition of an 
infinitesimal tangent depends on the equations/，not only on the set S. We must have 
particular equations in mind when speaking of infinitesimal tangents. 

For sets S which are sufficiently smooth, the converse of (6.9) is also true: 
Every infinitesimal tangent is a tangent vector. When this is the case, we can com¬ 
pute the space of tangent vectors at a point x ^ S by solving the linear equations 
(V/(x) • u) = 0 for v y which is relatively easy. However, this converse will not be 
true at “singular points” of the set 5, or if the defining equations for S are chosen 
poorly. For example, let S denote the union of the two coordinate axes in R 2 . This is 
a real algebraic set defined by the single equation x\x 2 = 0, It is clear that at the 
origin a tangent vector must be parallel to one of the two axes. On the other hand, 
V/ = (x 2 ， xi )， which is zero when x\ = x 2 = 0. Therefore every vector is an 
infinitesimal tangent to S at the origin. 

This completes our general discussion of tangent vectors. We will now apply 
this discussion to the case that the set S is one of our linear groups G in U nXn or 
C nXn . The tangent vectors to G will be n 2 -dimensional vectors, and we will repre¬ 
sent them by matrices too_ As we said earlier, the vectors tangent to G at the iden¬ 
tity I form the Lie algebra of the group. 

The first thing to notice is that every one-parameter subgroup e tA of our linear 
group G is a parametrized path. We already know that its velocity vector 

/dt\=Q is A. So A represents a tangent vector to G at the identity — it is in the 
Lie algebra. For example, the unitary group lh is the unit circle in the complex 
plane, and e" is a one-parameter subgroup of U\. The velocity vector of this one- 
parameter subgroup at r = 0 is the vector /， which is indeed a tangent vector to the 
unit circle at the point 1. 
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A matrix group G which is a real algebraic set in U nXn is called a real alge¬ 
braic group• The classical linear groups such as SL n (U) and O n are real algebraic ， be¬ 
cause their defining equations are polynomial equations in the matrix entries. For ex¬ 
ample, the group 5L 2 (1M) is defined by the single polynomial equation det 尸 =1: 

^ 11 ^ 22 -^ 12^21 - 1 = 0 . 


The orthogonal group 0 3 is defined by nine polynomials f t j expressing the condition 
P l P = /; 


fij = XuXij + X2iX2j + X^j - 8ij = 0 3 8ij = 



if / 幻 _ 


if / = 



* 


Complex groups such as the unitary groups can also be made into real algebraic 
groups in U 2nXn by separating the matrix entries into their real and imaginary parts. 

It is a fact that for every infinitesimal tangent A to a real algebraic group G at 
the identity, is a one-parameter subgroup of G. In other words，there is a one- 
parameter subgroup leading out from the identity in an arbitrary tangent direction. 
This is quite remarkable for a nonabelian group, but it is true with essentially no re¬ 
striction. Unfortunately, though this fact is rather easy to check for a particular 
group, it is fairly hard to give a general proof. Therefore we will content ourselves 
with verifying particular cases. 

Having an infinitesimal element available，we may work with matrices whose 
entries are in E. Such a matrix will have the form A + Be ， where A,B are real ma¬ 
trices .Intuitively, A + Be represents an infinitesimal change in A in the direction of 
the matrix B. The rule for multiplying two such matrices is the same as (6.5): 

(6.10) (A + Be)(C + De) = AC + (AD + BC)e. 

The product BeDe is zero because {bije){dkie) = 0 for all values of the indices. 

Let G be a real algebraic group. To determine its infinitesimal tangent vectors 
at the identity, we must determine the matrices A such that 

(6.11) / + 如， 

which represents an infinitesimal change in I in the direction of the matrix A, 
satisfies the equations defining G. This is the definition (6,8) of an infinitesimal tan¬ 
gent. 

Let us make this computation for the special linear group The defining 

equation for this group is det P = L So A is an infinitesimal tangent vector if 
det (/ + Ae) = L To describe this condition, we must calculate the change in the 
determinant when we make an infinitesimal change in I . The formula is nice: 

(6.12) det(/ + Ae) = 1 + (trace A)e. 


The proof of this formula is left as an exercise. Using it, we find that A is an 
infinitesimal tangent vector if and only if trace A = 0. 
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(6.13) Proposition. The following conditions on a real nXn matrix A are equiva¬ 
lent: 

(i) trace A = 0; 

(ii) e tA is a one-parameter subgroup of 5L rt ([R); 

(iii) A is in the Lie algebra of SL n (U)] 

(iv) A is an infinitesimal tangent to SL n (U) at I. 

Proof. Proposition (5.11) tells us that (i) => (ii). Since A is tangent to the path 
at r = 0, （ ii) => (iii)• The implication (iii) => (iv) is (6.9) ? and (iv) => (i) fol¬ 
lows from (6.12). □ 

There is a general principle at work here. We have three sets of matrices A: 
those such that is a one-parameter subgroup of G, those which are in the Lie al¬ 
gebra, and those which are infinitesimal tangents. Let us denote these three sets by 
Exp(G) ， Lie(G), and Inf (G). They are related by the following inclusions: 

(6.14) Exp(G) C Lie(G) C Inf ⑹. 

The first inclusion is true because A is the tangent vector to e tA at f = 0, and the sec¬ 
ond holds because every tangent vector is an infinitesimal tangent. If Exp(G)= 
Inf(G), then these two sets are also equal to Lie(G), Since the computations of 
Exp(G) and Inf(G) are easy，this gives us a practical way of determining the Lie al¬ 
gebra. A general theorem exists which implies that Exp(G) = Inf(G) for every real 
algebraic group, provided that its defining equations are chosen properly. However ， 
it isn’t worthwhile proving the general theorem here. 

We will now make the computation for the orthogonal group O n - The defining 
equation for O n is the matrix equation P l P = I. In order for A to be an infinitesimal 
tangent at the identity, it must satisfy the relation 

(6.15) (/ + AeY(l + Ae) = I. 

The left side of this relation expands to / + (A l + A)e, so the condition that I + Ae 
be orthogonal is A 1 + A = 0, or A is skew-symmetric. This agrees with the condition 
(5.10) for e tA to be a one-parameter subgroup of O n . 

(6.16) Proposition. The following conditions on a real nx n matrix A are equiva¬ 
lent: 

(i) A is skew-symmetric; 

(ii) e iA is a one-parameter subgroup of O n \ 

(iii) A is in the Lie algebra of O n \ 

(iv) A is an infinitesimal tangent to O n at /. □ 

The Lie algebra of a linear group has an additional structure, an operation 
called the Lie bracket • The Lie bracket is the law of composition defined by the rule 

(6.17) [A ， 5] = AB - BA. 
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This law of composition is not associative. It does, however, satisfy an identity 
called the Jacobi identity ， 

(6.18) + [B,[C,A]] + [C,[A,B]] = 0, 

which is a substitute for the associative law. 

To show that the bracket is a law of composition on the Lie algebra, we must 
check that if A,B are in Lie(G), then [A,B] is also in Lie(G). This can be done easily 

for any particular group. For the special linear group, the required verification is 

+ 

that if A,B have trace zero, then AB — BA also has trace zero. This is true, because 
trace AS = trace BA. Or let G — O ny so that the Lie algebra is the space of skew- 
symmetric matrices. We must verify that if A,B are skew，then [A ， fi] is skew too: 

[A y B] 1 二 （AS - BA) X = BW - A'B 1 = BA ~ AB = - 

as required. 

The bracket operation is important because it is the infinitesimal version of the 
commutator PQP~ l Q~ l . To see why this is so, we must work with two infinitesimals 
€,8, using the rules e 2 = S 2 = 0 and eS = 8e. Note that the inverse of the matrix 
/ + At is / — Ae. So if P = / + Ae and Q = I + B8, the commutator expands to 

(6.19) (/ + Ae) (/ + b8)(I — Ae) (/ — b5) = / + (AS _ BA)€d, 

Intuitively, the bracket is in the Lie algebra because the product of two elements in 
G, even infinitesimal ones, is in G, and therefore the commutator of two elements is 
also in G. 

Using the bracket operation，we can also define the concept of Lie algebra ab¬ 
stractly, 

(6.20) Definition. A Lie algebra V over a field F is a vector space together with a 
law of composition 

VxV —— > V 

V 5 W /vvw^ [I ；， vv] 

called the bracket ， having these properties: 

(i) bilinearity: [vi + v 2y w] = [v u w] + [u 2? w], [cv, w] = c[u ， w ]， 

[v,Wi + W 2 ] = [U ， Wi] + [d, vv 2 ], [v,cw] = C[t),W ]， 

(ii) skew symmetry: [u ， r] = 0 ， 

(iii) Jacobi identity: [m ， [u ， w]] + [u ， [w ， w]] + [w ， [“ ， d]] = 0 ， 

for all u y v,w EV and all c E F. 

The importance of Lie algebras comes from the fact that, being vector spaces ， 
they are much easier to work with than the linear groups themselves, and at the same 
time the classical groups are nearly determined by their Lie algebras* In other 
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words, the infinitesimal structure of the group at the identity element is almost 
enough to determine the group. 


% TRANSLATION IN A GROUP 


We will use one more notion from topology in this section — the definition of mani¬ 
fold in U k . This definition is reviewed in the appendix [Definition (3.12)]. Do not be 
discouraged if you are not familiar with the concept of manifold. You can learn what 
is necessary without much trouble as we go along. 

Let P be a fixed element of a matrix group G. We know that left multiplication 
by 尸 is a bijective map from G to itself: 

(7.1) G—^G 

because it has the inverse function mp-i The maps mp and m P -\ are continuous, be¬ 
cause matrix multiplication is continuous. Thus mp is a homeomorphism from G to 
itself (not a homomorphism). It is also called left translation by P, in analogy with 
translation in the plane, which is left translation in the additive group K 2+ . 

The important property of a group which is implied by the existence of these 
maps is homogeneity • Multiplication by 尸 is a homeomorphism which carries the 
identity element I to P. So the topological structure of the group G is the same near 
I as it is near P, and since P is arbitrary, it is the same in the neighborhoods of any 
two points of the group. This is analogous to the fact that the plane looks the same 
at any two points. 

Left multiplication in SU 2 happens to be defined by an orthogonal change of 
the coordinates (x\,x 2 , x 3 , x 4 ), so it is a rigid motion of the 3-sphere. But multiplica¬ 
tion by a matrix needn't be a rigid motion, so the sense in which any group is homo¬ 
geneous is weaker. For example, let G be the group of real invertible diagonal 2x2 
matrices, and let us identify the elements of G with the points (a, d) in the plane ， 
which are not on the coordinate axes. Multiplication by the matrix 


(7.2) 


P 


2 

0 


0 


distorts the group G，but it does so continuously. 



(7.3) Figure, Left multiplication in a group. 



Section 7 Translation in a Group 


293 


Now the only geometrically reasonable subsets of U k which have this homo¬ 
geneity property are manifolds. A manifold M of dimension J is a subset which is 
locally homeomorphic to at any one of its points, meaning that every point 
p E ： M has a neighborhood homeomorphic to an open set in R d [see Appendix 
(3.12)]. It isn’t surprising that the classical groups，being homogeneous, are mani¬ 
folds ,though there are subgroups of GL n which aren’t. The group GL n (Q) of invert¬ 
ible matrices with rational coefficients, for example，is a rather ugly set when viewed 
geometrically, though it is an interesting group. The following theorem gives a sat¬ 
isfactory answer to the question of which linear groups are manifolds: 

(7.4) Theorem • Let G be a subgroup of GL n (U) which is a closed set in U nXn . 
Then G is a manifold. 

Giving the proof of this theorem here would take us too far afield. Instead，we will 
illustrate the theorem by showing that the orthogonal groups O n are manifolds. The 
proofs for other classical groups are similar. 

(7.5) Proposition, The orthogonal group O n is a manifold of dimension 
\n{n - 1). 

Proof. Let us denote the group O n by G and denote its Lie algebra, the space 
of skew-symmetric matrices ， by L. Proposition (5.9) tells us that for matrices A near 
0 y A E L if and only if e A E G. Also, the exponential is a homeomorphism from a 
neighborhood of 0 in U nXn to a neighborhood of /. Putting these two facts together, 
we find that the exponential defines a homeomorphism from a neighborhood of 0 in 
L to a neighborhood of I in G. Since L is a vector space of dimension {n{n — 1), it 
is a manifold. This shows that the condition of being a manifold is satisfied by the 
orthogonal group at the identity. On the other hand, we saw above that any two 
points in G have homeomorphic neighborhoods. Therefore G is a manifold, as 
claimed. □ 




(7.6) Figure. 
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Here is another application of the principle of homogeneity: 

(7.7) Proposition. Let G be a path-connected matrix group，and let H C G be a 
subgroup which contains a nonempty open subset of G. Then H = G. 

Proof. By hypothesis, H contains a nonempty open subset U of G. Since left 
multiplication by g E G is a homeomorphism ，gU is also open in G. Each translate 
gU is contained in a single coset of H, namely in gH. Since the translates of U cover 
G ，they cover each coset. In this way, each coset is a union of open subsets of G，and 
hence it is open itself. So G is partitioned into open subsets — the cosets of H. Now 
a path-connected set is not a disjoint union of proper open subsets [see Appendix, 
Proposition (3.11)]. Thus there can be only one coset, and H = G. n 

We will now apply this proposition to determine the normal subgroups of SUi- 

(7.8) Theorem. The only proper normal subgroup of SU 2 is its center {±/}. 

Since there is a surjective map <p: SU 2 - >S 0 3 whose kernel is {±/}, the rota¬ 

tion group is isomorphic to a quotient group of SU 2 [Chapter 2 (10.9)]: 

(7.9) S0 3 ^ Slhl{±l). 

(7.10) Corollary. SO3 is a simple group; that is, it has no proper normal sub¬ 
group. 

Proof. The inverse image of a normal subgroup in SO 3 is a normal subgroup 
of SUi which contains {±/} [Chapter 2 (7*4)]. Theorem (7*8) tells us that there are 
no proper ones* □ 

Proof of Theorem (7,8). It is enough to show that if is a normal subgroup 
of Slh which is not contained in the center {±/}，then N is the whole group. Now 
since N is normal, it is a union of conjugacy classes [Chapter 6 (2.5)], And we have 



Slh Slh 

(7.11) Figure. 
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seen that the conjugacy classes are the latitudes, the 2-spheres (2.8). By assumption, 
N contains a matrix p ^ 土 /， so it contains the whole conjugacy class C = C P , 
which is a 2-sphere. Intuitively, this set looks big enough to generate Slh. For it has 
dimension 2 and is not a subgroup. So the set S of all products P~ l Q with P，Q E C is 
larger than C. Therefore S ought to have dimension 3, which is the dimension of SU 2 
itself ， so it ought to contain an open set in the group. 

To make this intuitive reasoning precise，we choose a nonconstant continuous 
map from the unit interval [0,1] to C such that Po = P and Pi ^ P, We form the 
path 

(7.12) Q t = P~ l P t . 

Then Qo = I, and Q\ ^ /, so this path leads out from /. Since P and Pt are in N, Q t 
is in N for every t E [0, 1]. We don’t need to know anything else about the path Q t . 

Let/(0 be the function trace Q t . This is a continuous function on the interval 
[0, 1], Note that/(0) = 2, while/(l) = r < 2 because Q\ ^ /. By continuity, all 
values between t and 2 are taken on by/in the interval. 

Since N is normal，it contains the conjugacy class of Q t for every t. So since 
trace Q t takes on all values near 2, Proposition (2,9) tells us that N contains all ma¬ 
trices in Slh whose trace is sufficiently near to 2， and this includes all matrices 
sufficiently near to /. So A/ contains an open neighborhood of l in SU 2 . Now SU 2 , 
being a sphere，is path-connected, so Proposition (7.7) completes the proof. □ 

We can also apply translation in a group G to tangent vectors. If A is a tangent 
vector at the identity and if P E G is arbitrary, then PA is a tangent vector to G at 
the point P. Intuitively，this is because P(l + Ae) = P + PAe is the product of ele¬ 
ments in G, so it lies in G itself. As always, this heuristic is easy to check for a par¬ 
ticular group. We fix A ， and associate the tangent vector PA to the element P of G. In 
this way we obtain what is called a tangent vector field on the group G. Since A is 
nonzero and P is invertible, this vector field does not vanish at any point. Now just 
the existence of a tangent vector field which is nowhere zero puts strong restrictions 
on the space G. For example，it is a theorem of topology that any vector field on the 
2-sphere must vanish at some point. That is why the 2-sphere has no group struc¬ 
ture. But the 3-sphere, being a group, has tangent vector fields which are nowhere 
zero. 


8. SIMPLE GROUPS 

Recall that a group G is called simple if it is not the trivial group and if it contains 
no proper normal subgroup (Chapter 6， Section 2). So far, we have seen two non- 
abelian simple groups: the icosahedral group I ^ As [Chapter 6 (2.3)] and the rota¬ 
tion group SCh (7 JO). This section discusses the classification of simple groups. We 
will omit most proofs. 

Simple groups are important for two reasons. First of all，if a group G has a 
proper normal subgroup N, then the structure of G is partly described when we 
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know the structure of N and of the quotient group G/N. If N or G/N has a normal 
subgroup，we can further decompose the structure of these groups. In this way we 
may hope to describe a particular finite group G, by building it up inductively from 
simple groups. 

Second, though the condition of being simple is a very strong restriction, sim¬ 
ple groups often appear. The classical linear groups are almost simple. For example, 
we saw in the last section that SJJi has center {±/} and that SU 2 / {±1} ^ SO 3 is a 
simple group. The other classical groups have similar properties. 

In order to focus attention, we will restrict our discussion here to the complex 
groups. We will use the symbol Z to denote the center of any group. The following 
theorem would take too much time to prove here，but we will illustrate it in the spe¬ 
cial case of SL 2 (C). 

(8.1) Theorem* 

(a) The center Z of the special linear group SL n (C) is a cyclic group, generated by 

the matrix f/ where f The quotient group SL n (C)/Z is simple if 

n > 2. 

(b) The center Z of the complex special orthogonal group SO n (C) is {±/} if n is 
even，and is the trivial group {/} if n is odd. The group SO n /Z is simple if 
n = 3 or if n s 5. 

(c) The center Z of the symplectic group SP 2 n(C) is {±/}, and 5P^(C)/Z is simple 
if n ^ 1. □ 

The group SL n (C)/Z is called the projective group and is denoted by PSL n (C): 

(8.2) PSL n (C) = SL n (€)/Z, where Z = {(I | ( n = 1}. 

To illustrate Theorem (8.1), we will prove that PSL 2 (C) = SL 2 (C)/{±l} is 
simple. In fact, we will show that PSL 2 (F) is a simple group for almost all fields F. 

(8.3) Theorem. Let F be a field which is not of characteristic 2 and which con¬ 
tains at least seven elements. Then the only proper normal subgroup of SL 2 (F) is the 
subgroup {±/}. Thus PSL 2 (F) = SL 2 (F)/{±l} is a simple group. 

Since the center of SL 2 (F) is a normal subgroup, it follows from the theorem 
that it is the group {±l}. 

(8.4) Corollary• There are infinitely many nonabelian finite simple groups. 

Proof of Theorem (8.3). The proof is algebraic，but it is closely related to the 
geometric proof given for the analogous assertion for SU 2 in the last section. Our 
procedure is to conjugate and multiply until the group is generated* To simplify nota¬ 
tion, we will denote SL 2 (F) by SL 2 . Let TV be a normal subgroup of SL 2 which con¬ 
tains a matrix A 关土 /. We must show that N = SL 2 . Since one possibility is that TV 
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is the normal subgroup generated by A and its conjugates, we must show that the 
conjugates of this one matrix suffice to generate the whole group. 

The first step in our proof will be to show that TV contains a triangular matrix 
different from ±1. Now if our given matrix A has eigenvalues in the field F, then it 
will be conjugate to a triangular matrix. But since we want to handle arbitrary fields ， 
we can not make this step so easily. Though easy for the complex numbers, this step 
is the hardest part of the proof for a general field. 


(8.5) Lemma. N contains a triangular matrix A ♦ ±1 、 


Proof. Let A 


be a matrix in N which is different from 土 /. If 


a b 
c d 

0, then A is the required matrix. 

Suppose that c ^ 0. In this case，we will construct a triangular matrix out of A 
and its conjugates. We first compute the conjugate 


c 


f 1 x^\\a ^Ifl -x 

c d 




fa+Jcc * 
c d~xc 


— a r 


A 


Since c ^ 0, we may choose x so that a xc = 0. The matrix A 1 is in N, so N con¬ 
tains a matrix whose upper left entry is zero. We replace A by this matrix, so that it 

b 

has the form A = c ^ • Unfortunately the zero is in the wrong place. 

__ 

Note that since detA = 1， be = —1 in our new matrix A. We now compute the 
commutator P~ l A~ l PA with a diagonal matrix: 


P' l A~ l PA = 


This matrix, which is in our normal subgroup #， is as required unless it is ±7. If so, 
then u 2 — ±1 and u 4 = L But we are free to form the matrix P with an arbitrary 
element w in F x . We will show [Chapter 11 (L8)] that the polynomial — 1 has at 
most four roots in any field. So there are at most four elements u E ： F with u 4 — 1. 
Our hypothesis is that F x contains at least five elements. So we can choose u E F x 
with w 4 羊 1. Then P~ l A' 1 PA is the required matrix. □ 


— 

u 

_ d -b 

■ 

u~ l 

_ b 


u 2 (\-u 2 )bd 

u~ l 

__ 

-c 

u 

— __ 

c d 

— 


u~ 2 — 


(8.6) Lemma. N contains a matrix of the form 


,with ii # 0. 


Proof. By the previous lemma, N contains a triangular matrix A 




a b 
d 

Then b ' 


* ±/. If # a, let A ; = 
=b + d — a. Since detA 


A'—4 


a b f 
d 
ad 




be its conjugate by the matrix 
1, the product 


_d -b r 


a 

b 


1 ad-d 2 

a 



d_ 


_ 1 _ 
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is the required matrix. If a = d, then a = ±1 because det A = 1 ， and it follows that 
b ^ O.ln this case, one of the two matrices A or A 2 is as required, □ 


(8.7) Lemma. Let F be a field. The conjugacy class in SLi of the matrix 


contains the matrices 


and 


a 2 u 


,for all a # 0, 


Proof• 


-1 


1 u 


-1 


and 


a 


a 


a 


-i 


a 


a 2 u 


* □ 


(8.8) Lemma. Let F be a field of characteristic # 2. The additive group of the 
field is generated by the squares of elements of F. 

Proof. We show that every element x E ： F can be written in the form 
a 2 — b 2 = (a + b){a — b), with a, b E F. To do this，we solve the system of 
linear equations a + b = l, a — b = x. This is where the assumption that the char¬ 
acteristic of F is not 2 is used. In characteristic 2, these equations need not have a 
solution. □ 


(8.9) Lemma. Let F be a field of characteristic 乒 2. If a normal subgroup of 

1 u 


SL 2 (F) contains a matrix 


with w ^ 0, then it contains all such matrices 


Proof. The set of x such that 1 E AMs a subgroup of call it S. We 

— __ 

want to show that S = Lemma (8.7) shows that if u E S, then a 2 u E S for all 
a E F. Since the squares generate F + , the set of elements {a 2 u \ a E F} generates 
the additive subgroup F + m of and this subgroup is equal to F + because u is in¬ 
vertible. Thus 5 = as required •口 

(8.10) Lemma. For every field F, the group SL 2 (F) is generated by the elemen¬ 


tary matrices 


and 


Proof • We perform row reduction on a matrix A = E SL 2 (F), using 

■ 

only the matrices of this form. We start work on the first column, reducing it to 
We eliminate the case c = 0 by adding the first row to the second if necessary. Then 
we add a multiple of the second row to the first to change a to l. Finally, we clear 


out the entry c. At this point, the matrix has the form A f = 


1 b f 
0 d f 


Then 


d f = det = det A = 1 ? and we can clear out the entry b\ ending up with the 
identity matrix. Since we needed four operations or less to reduce to the identity, A 
is a product of at most four of these elementary matrices. □ 
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The proof of Theorem (8.3) is completed by combining Lemmas (8.6), (8.7 )， 
(8.9)，and (8.10). □ 

A famous theorem of Cartan asserts that the list (8.1) of simple groups is al¬ 
most complete. Of course there are other simple groups; for instance, we have just 
proved that PSL 2 (F) is simple for most fields F. But if we restrict ourselves to com¬ 
plex algebraic groups, the list of simple groups becomes very short. 

A subgroup G of GL n (C) is called a complex algebraic group if it is the set of 
solutions of a finite system of polynomial equations in the matrix entries. This is 
analogous to the concept of a real algebraic group introduced in Section 6. It will 
not be apparent why the property of being defined by polynomial equations is a rea¬ 
sonable one，but one thing is easy to see: Except for the unitary groups U n and SU n ， 
all the complex classical groups are complex algebraic groups. 

(8.11) Theorem. 

(a) The groups PSL n {C) = SL n {C)/Z, SO n {C)/Z, and SP 2n (0)/Z are path- 
connected complex algebraic groups. 

(b) In addition to the isomorphism classes of these groups，there are exactly five 
isomorphism classes of simple ， path-connected complex algebraic groups, 
called the exceptional groups. 

Theorem (8,11) is too hard to prove here. It is based on a classification of the 
corresponding Lie algebras. What we should learn is that there are not many simple 
algebraic groups. This ought to be reassuring after the last chapter，where structures 
on a vector space were introduced one after the other, each with its own group of 
symmetries. There seemed to be no end. Now we see that we actually ran across 
most of the possible symmetry types，at least those associated to simple algebraic 
groups. It is no accident that these structures are important •口 

A large project，the classification of the finite simple groups, was completed in 
1980. The finite simple groups we have seen are the groups of prime order, the 
icosahedral group I ^ As [Chapter 6 (2.3)]，and the groups PSL^iF) where F is a 
finite field (8,3), but there are many more. The alternating groups A n are simple for 
all n ^ 5. 

Linear groups play a dominant role in the classification of the finite simple 
groups as well as of the complex algebraic groups. Each of the forms (8.11) leads to 
a whole series of finite simple groups when finite fields are substituted for the com¬ 
plex field. Also, some finite simple groups are analogous to the unitary groups. All 
of these finite linear groups are said to be of Lie type. 

According to Theorem (8.3), P5L 2 (F 7 ) is a finite simple group; its order is 168. 
This is the second smallest simple group; As is the smallest. The orders of the 
smallest nonabelian simple groups are 

(8.12) 60, 168, 360, 504, 660, 1092, 2448, 
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For each of these seven integers n 9 there is a single isomorphism class of simple 
groups of order n, and it is represented by PSL 2 (F) for a suitable finite field F. [The 
alternating group A$ happens to be isomorphic to P5L 2 (F5)-] 

In addition to the groups of prime order, the alternating groups, and the 
groups of Lie type, there are exactly 26 finite simple groups called the sporadic 
groups • The smallest sporadic group is the Matthieu group M u ，whose order is 7920. 
The largest is called the Monster; its order is roughly 10 53 . So the finite simple 
groups form a list which，though longer, is somewhat analogous to the list (8.11) of 
simple algebraic groups. 


It seems unfair to crow about the successes of a theory 

and to sweep all its failures under the rug. 

Richard Brauer 


EXERCISES 

h The Classical Linear Groups 

1* (a) Find a subgroup of GL 2 (IR) which is isomorphic to C x . 

(b) Prove that for every n ， GL n (C) is isomorphic to a subgroup of GL 2/I (IR), 

2, Show that 50 2 (C) is not a bounded set in C 4 . 

3* Prove that SP 2 (U) = SL 2 (U), but that SP 4 (U) -h SL 4 (U). 


4* According to Sylvester’s Law，every 2x2 real symmetric matrix is congruent to exactly 
one of six standard types. List them. If we consider the operation of GL 2 (U) on 2 x 2 
matrices by P.A^^PAP 1 , then Sylvester’s Law asserts that the symmetric matrices form 
six orbits. We may view the symmetric matrices as points in IR 3 , letting (x,y,z) corre 

^ : y 


spond to the matrix 


7 1 


Find the decomposition of R 3 into orbits explicitly, and 


make a clear drawing showing the resulting geometric configuration • 


5* A matrix p is orthogonal if and only if its columns form an orthonormal basis. Describe 
the properties that the columns of a matrix must have in order for it to be in the Lorentz 
group 0 3t u 

6. Prove that there is no continuous isomorphism from the orthogonal group O 4 to the 
Lorentz group 0 3 ， i. 


7* Describe by equations the group O h u and show that it has four connected components. 

8 * Describe the orbits for the operation of on the space of real symmetric matrices 

by P^A^^PAP 1 . 


9* Let F be a field whose characteristic is not 2. Describe the orbits for the action 
P.A^^PAP 1 of GLtiF) on the space of 2 x 2 symmetric matrices with coefficients in 厂 

10, Let F = F 2 . Classify the orbits of GL n {F) for the action on the space of symmetric n x /1 
matrices by finding representatives for each congruence class. 
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11. Prove that the following matrices are symplectic，if the blocks are nXn: 

， where B = B x and A is invertible, 

12. Prove that the symplectic group SP 2 n (U) operates transitively on R 2n . 

*13* Prove that is path-connected，and conclude that every symplectic matrix has de 

terminant 1 , 


2. The Special Unitary Group SU 2 

1* Let P y Q be elements of SU 2 , represented by the real vectors (x\ ， 义 2 ，义 3 ， jc 4 ), (y\ , 乃 ， 乃， y 4 ). 

Compute the real vector which corresponds to the product PQ, 

2* Prove that the subgroup S0 2 of SU 2 is conjugate to the subgroup T of diagonal matrices, 

3* Prove that SU 2 is path-connected. Do the same for SO 3 . 

4. Prove that U 2 is homeomorphic to the product S 1 2 3 4 5 6 7 * 9 10 x S { . 

5. Let G be the group of matrices of the form X ^ , where jc, j G [R and x > 0. Dete 卜 

l J J 

mine the conjugacy classes in G ? and draw them in the (x,j)-plane. 

* 6 . (a) Prove that every element P (2.4) of SU 2 can be written as a product: P = DRD\ 
where D,D f G T (2*13), and R E S0 2 is a rotation through an angle 6 with 
0 < 0 < tt/2. 

(b) Assume that the matrix entries a,b of p are not zero. Prove that this representation is 
unique, except that the pair D ， D’ can be replaced by -D, -D f . 

(c) Describe the double cosets TpT, P e SU 2 - Prove that if the entries a,b of P are not 
zero, then the double coset is homeomorphic to a torus，and describe the remaining 
double cosets. 


3, The Orthogonal Representation of SU 2 

1 . Compute the stabilizer H of the matrix ^ 1 for the action of conjugation by SU2, and 

— 

describe <p(p) for P E. H. 

2 . Prove that every great circle in SU 2 is a coset of one of the longitudes (2.14). 

3. Find a subset of IR 3 which is homeomorphic to the space 5 x 0 of (3.4). 

4. Derive a formula for (A, A) in terms of the determinant of A. 

5. The rotation group SO 3 may be mapped to the 2 -sphere by sending a rotation matrix to 
its first column. Describe the fibres of this map, 

6 . Extend the map <p defined in this section to a homomorphism 4>: U 2 - >S0 3 , and de^ 

scribe the kernel of 少， 

7. Prove by direct computation that the matrix (3.11) is in SO 3 . 

* 8 . Describe the conjugacy classes in S0 3 carefully，relating them to the conjugacy classes of 
SU 2 - 

9. Prove that the operation of SU 2 on any conjugacy class other than {/}，{-/} is by rota' 
tions of the sphere. 

10. Find a bijective correspondence between elements of SO3 and pairs (p, v) consisting of a 
point p on the unit 2-sphere S and a unit tangent vector d to 5 at p. 
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11. Prove Proposition (3,20). 

*12. (a) Calculate left multiplication by a fixed matrix P in SU 2 explicitly in terms of the co¬ 
ordinates jci ， jc 2 , jc 3 , jc 4 . Prove that it is multiplication by a 4 x 4 orthogonal matrix 
q 9 hence that it is a rigid motion of the unit 3-sphere S 3 . 

(b) Prove that Q is orthogonal by a method similar to that used in describing the orthogo¬ 
nal representation: Express dot product of the vectors (: ， 义 2 , 义 3 ,义 4 )， (义 I ^ 2 ，义 3 ' ，义 4 ’) 
corresponding to two matrices P y P f £ SU 2 in terms of matrix operations. 

(c) Determine the matrix which describes the operation of conjugation by a fixed matrix 
P on SU 2 - 

*13. (a) Let //, be the subgroup of SO 3 of rotations about the jcraxis，/ = 1 ， 2, 3_ Prove that 

every element of SO 3 can be written as a product ABA r 7 where A y A f E H\ and 
B G H 2 ^ Prove that this representation is unique unless B = I. 

(b) Describe the double cosets H\QH\ geometrically. 

*14. Let Hi be the subgroup of SO 3 of rotations about the jc,-axis_ Prove that every element 
Q G SO 3 can be written in the form Ai A 2 A 3 , with At G Hi. 

4, The Special Linear Group SL 2 (U) 

1, Let G = SL 2 (C ). Use the operation of G on rays {rX} \ r G IR, r > 0} in C 2 to prove that 

G is homeomorphic to the product where H is the stabilizer of the ray {re\} y and 

describe H explicitly, 

2, (a) Prove that the rule P, A pap^ defines an operation of SL 2 (C) on the space W of 

all hermitian matrices. 

(b) Prove that the function (A,A f ) = det(A + A f ) — det A — det A' is a bilinear form on 
W, whose signature is (3,1). 

(c) Use (a) and (b) to define a homomorphism <p: SL 2 (C) - whose kernel is 

{±/}. 

*(d) Prove that the image of <p is the connected component of the identity in O”. 

3. Let 尸 be a matrix in S0 3 (C). 

(a) Prove that 1 is an eigenvalue of P. 

(b) Let X\,X 2 be eigenvectors for P，with eigenvalues Ai ， A 2 . Prove that X\ l X 2 — 0, un¬ 
less A] = A 2 _1 - 

(c) Prove that if X is an eigenvector with eigenvalue 1 and if 尸关 /， then X i X + 0. 

4. Let G = ⑽ 3 (C). 

(a) Prove that left multiplication by G is a transitive operation on the set of vectors X 
such that X l X — 1. 

(b) Determine the stabilizer of e x for left multiplication by G, 

(c) Prove that G is path-connected. 

5 . One-Parameter Subgroups 

1, Determine the differentiable homomorphisms from C+ to SL n {C). 

2, Describe all one-parameter subgroups of C x . 

3, Describe by equations the images of all one-parameter subgroups of the group of real 
2x2 diagonal matrices, and make a neat drawing showing them, 

4, Let (p\ - > GLrt(IR) be a one-parameter subgroup. Prove that ker <p is either trivial, 

or the whole group, or else it is infinite cyclic. 
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5. Find the conditions on a matrix A so that is a one-parameter subgroup of the special 
unitary group SU n ，and compute the dimension of that group. 


6, Let G be the group of real matrices of the form 




,x > 0. 


(a) Determine the matrices A such that e tA is a one-parameter subgroup of G. 

(b) Compute e A explicitly for the matrices determined in (a). 

(c) Make a drawing showing the one-parameter subgroups in the (jc, y)-plane. 


7， Prove that the images of the one-parameter subgroups of SU 2 are the conjugates of T (see 
Section 3). Use this to give an alternative proof of the fact that these conjugates are the 
longitudes. 

8. Determine the one-parameter subgroups of Ui. 

9. Let <p(t) = e tA be a one-parameter subgroup of G. Prove that the cosets of im <p are ma 、 
trix solutions of the differential equation dx/dt = AX. 

10. Can a one-parameter subgroup of GL rt (lR) cross itself? 

*11. Determine the differentiable homomorphisms from S0 2 to GL tt ([R), 


6. The Lie Algebra 


1- Compute (A + Be)~\ assuming that A is invertible. 

2 . Compute the infinitesimal tangent vectors to the plane curve y 2 = x 3 at the point (1,1) 
and at the point (0,0)* 

3. (a) Sketch the curve C: x 2 2 = xi 3 — xi 2 . 

(b) Prove that this locus is a manifold of dimension 1 if the origin is deleted. 

(c) Determine the tangent vectors and the infinitesimal tangents to C at the origin. 


4. Let 5 be a real algebraic set defined by one equation/ = 0. 

(a) Show that the equation/ 2 = 0 defines the same locus S. 

(b) Show that V(/ 2 ) vanishes at every point jc of S, hence that every vector is an 
infinitesimal tangent at x 9 when the defining equation is taken to be/ 2 = 0. 

5. Show that the set defined by xy = 1 is a subgroup of the group of diagonal matrices 


x 


y 


and compute its Lie algebra 


6. Determine the Lie algebra of the unitary group. 

7. (a) Prove the formula det (/ + Ae) = 1 + trace Ae 

(b) Let A be an invertible matrix. Compute det(A + Be). 

8* (a) Show that 0 2 operates by conjugation on its Lie algebra. 

(b) Show that the operation in (a) is compatible with the bilinear form = 

3 trace AB. 

(c) Use the operation in (a) to define a homomorphism Oi - > Oi % and describe this ho¬ 

momorphism explicitly. 

9. Compute the Lie algebra of the following: (a) U n \ (b) SU n \ (c) 6> 3 ,i ； (d) SO n (C). In each 
case, show that is a one-parameter subgroup if and only if I + Ae lies in the group. 

A 

*10. Determine the Lie algebra of G = SA〆 哚 )， using block form M = —— * 

w LJ 

— 耆 ■ 

11. (a) Show that W becomes a Lie algebra if the bracket is defined to be the cross product 

[x 9 y] = xxr = (x 2 y 3 - ^ 3,^1 - y\X3,xiy 2 - x 2 yi). 

(b) Show that this Lie algebra is isomorphic to the Lie algebra of SO 3 * 
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12. Classify all complex Lie algebras of dimension <3. 

*13* The adjoint representation of a linear group G is the representation by conjugation on its 
Lie algebra: G x L — is defined to be P.A^^^PAP 1 . The form (A,A f ) = traceW) 
on L is called the Killing form. For each of the following groups, verify that if 尸 E G 
and A E L, then PAP " 1 E L, and prove that the Killing form is symmetric and bilinear 
and that the operation is compatible with the form, i ， e,, that (A,A ; ) = {PAP~\PA f P l ). 

(a) SO n (b) SU n (c) 0 3J (d) SO n (Q (e) SP 2n W 

14. Prove that the Killing form is negative definite on the Lie algebra of (a) SU n and (b) SO n . 

15. Determine the signature of the Killing form on the Lie algebra of 

16. (a) Use the adjoint representation of SU n to define a homomorphism <p\ SU n - >SO m , 

where m = n 2 — 1. 

(b) Show that when n = 2, this representation is equivalent to the orthogonal represen¬ 
tation defined in Section 3. 

17. Use the adjoint representation of SL 2 (C) to define an isomorphism ^ 2 (€)/{±/} « 

50 3 (C). 


Z Translation in a Group 


1. Compute the dimensions of the following groups. 

(a) SUn (b) SO n (C) (c) SP 2n m (d) 0 3tl 

2. Using the exponential，find all solutions near I of the equation P 2 = I • 

3. Find a path-connected, nonabelian subgroup of GLziU) of dimension 2. 

4. (a) Show that cv^ry positive definite hermitian matrix A is the square of another positive 

definite hermitian matrix B. 

(b) Show that B is uniquely determined by A . 

*5. Let A be a nonsingular matrix，and let fi be a positive definite hermitian matrix such that 
B 2 = AA*. 

(a) Show that A 一 1 is unitary. 

(b) Prove the Polar decomposition: Every nonsingular matrix A is a product A = PU ， 
where P is positive definite hermitian and U is unitary. 

(c) Prove that the Polar decomposition is unique. 

(d) What does this say about the operation of left multiplication by the unitary group U n 
on the group GL n l 

*6. State and prove an analogue of the Polar decomposition for real matrices. 

*7. (a) Prove that the exponential map defines a bijection between the set of all hermitian 
matrices and the set of positive definite hermitian matrices. 

(b) Describe the topological structure of GL 2 (C) using the Polar decomposition and (a). 

8. Let fi be an invertible matrix. Describe the matrices A such that P = is in the central¬ 
izer of B. 


*9. Let S denote the set—of matrices P E SL 2 (U) with trace r. These matrices can be written 

叉 y 


in the form 


z r-x 


， where z) lies on the quadric x(r — x) — yz 


(a) Show that the quadric is either a hyperbola of one or two sheets, or else a cone，and 
determine the values of r which correspond to each type. 

(b) In each case, determine the decomposition of the quadric into conjugacy classes. 
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(c) Extend the method of proof of Theorem (7*11) to show that the only proper normal 
subgroup of SL 2 (U) is {±/}. 

10. Draw the tangent vector field PA to the group C x ，when A = 1 + /, 


8. Simple Groups 


1. Which of the following subgroups of GL n (C) are complex algebraic groups? 

(a) GL n (Z) (b) SU n (c) upper triangular matrices 

2. (a) Write the polynomial functions in the matrix entries which define SO n (C) . 

(b) Write out the polynomial equations which define the symplectic group. 

(c) Show that the unitary group U n can be defined by real polynomial equations in the 
real and imaginary parts of the matrix entries. 

3. Determine the centers of the groups SL n {U) and SL n (C ). 

4. Describe isomorphisms (a) PSL 2 (¥2) — S 3 and (b) PSL 2 (^3) « A 4 . 

5. Determine the conjugacy classes of GLU. 

6. Prove that SL 2 (F) = PSL 2 {F) for any field F of characteristic 2. 

7. (a) Determine all normal subgroups of GLziC) which contain its center Z = {c/}, 

(b) Do the same for GL 2 (U). 

8. For each of the seven orders (8.12), determine the order of the field F such that PSL 2 (F) 
has order n. 


*9. Prove that there is a simple group of order 3420, 

10 . (a) Let Z be the center of GL n (C). Is PSL n (C) isomorphic to GL n (C)/Z? 

(b) Answer the same question as in (a), with U replacing C, 

11 . Prove that PSL 2 (¥ 5 ) is isomorphic to A 5 • 

* 12 . Analyze the proof of Theorem (8.3) to prove that PSL 2 (F) is a simple group when F is a 
field of characteristic 2, except for the one case F = 

13. (a) Let P be a matrix in the center of SO n , and let A be a skew-symmetric matrix. Prove 

that PA = AP by differentiating the matrix function e A \ 

(b) Prove that the center of SO n is trivial if n is odd and is {±/} if n is even and ^4, 

14 , Compute the orders of the following groups. 

(a) 50 2 (F 3 ) and S0 3 (¥ 3 ) 

(b) 50 2 (F 5 ) and S0 3 (¥ 5 ) 


% 15* (a) Consider the operation of SL 2 (C) by conjugation on the space V of complex 2x2 

matrices. Show that with the basis ，幻 2 , ， ^ 22 ) of V, the matrix of conjugation 



b 

d 


has the block form 


aB 

cB 


bB 

dB 


where B = (A 1 )' 1 = 


d -c 
_b a 


(b) Prove that this operation defines a homomorphism <p: SLziC) - >GL 4 (C), and that 

the image of <p is isomorphic to PSLziC). 

(c) Prove that PSL 2 (C) is an algebraic group by finding polynomial equations in the en¬ 
tries yij of a 4 x 4 matrix whose solutions are precisely the matrices in im <p. 

*16. Prove that PSL n (C) is a simple group. 

*17. There is no simple group of order 2 5 * 7 * 1L Assuming this, determine the next smallest 
order after 2448 for a nonabelian simple group. 
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Miscellaneous Exercises 


1. Quaternions are expressions of the form a = a + bi + cj + dk, where a, b 9 c,d G [R, 
They can be added and multiplied using the rules of multiplication for the quaternion 
group [Chapter 2 (2.12)]. 

(a) Let a = a — bi — cj — dk. Compute aa, 

(b) Prove that every a ^ 0 has a multiplicative inverse. 

(c) Prove that the set of quaternions a such that a 2 + b 2 + c 2 + d 2 = 1 forms a group 
under multiplication which is isomorphic to SU 2 . 

2. The affine group A n = A rt (lR) is the group of coordinate changes in (x\ ，...， jc n ) which is 
generated by GL n (R) and by the group T n of translations: g(jc) = x + a. Prove that T n is 
a normal subgroup of A n and that An/T n is isomorphic to GL n (K), 


3. Cayley Transform: Let U denote the set of matrices A such that / + A is invertible, and 
define A f = (i — A)(/ + A) _l . 

(a) Prove that if A E U, then A f E U, and prove that A n = A. 

(b) Let V denote the vector space of real skew-symmetric nX n matrices. Prove that the 

rule — a) (/ + A)" 1 defines a homeomorphism from a neighborhood of 0 in 

V to a neighborhood of / in SO n ， 

(c) Find an analogous statement for the unitary group. 


(d) Let 


0 I 
-I 0 


.Show that a matrix A E C/ is symplectic if and only if 


A n J = ~JA\ 

*4. Let p(t) — t 2 - ut + 1 be a quadratic polynomial, with coefficients in the field F - ¥ p . 

(a) Prove that if p has two distinct roots in F, then the matrices with characteristic poly¬ 
nomial p form two conjugacy classes in SL 2 (F ) 7 and determine their orders. 

(b) Prove that if p has two equal roots, then the matrices with characteristic polynomial 
p form three conjugacy classes in SL n (F ), and determine their orders. 

(c) Suppose that p has no roots in F. Determine the centralizer of the matrix 


A = 


in SL 2 (F), and compute the order of the conjugacy class of A. 


(d) Find the class equations of i SL 2 (F 3 ) and 5L 2 (F 5 ). 

(e) Find the class equations of f\SX 2 (F 3 ) and P5L 2 (F 5 ), and reconcile your answer with 
the class equations of A 4 and As . 

(f) Compute the class equation for 5L 2 (F 7 ) and for P5L 2 (F 7 ). Use the class equation for 
PSL 2 (¥ 7 ) to show that this group is simple* 
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A tremendous effort has been made by mathematicians 
for more than a century to clear up the chaos in group theory • 
Stilly we cannot answer some of the simplest questions. 

Richard Brauer 


L DEFINTTIONOFA GROUP REPRESENTATION 

Operations of a group on an arbitrary set were studied in Chapter 5. In this chap¬ 
ter we consider the case that the group elements act as linear operators on a 
vector space. Such an operation defines a homomorphism from G to the general 
linear group. A homomorphism to the general linear group is called a matrix repre¬ 
sentation. 

The finite rotation groups are good examples to keep in mind. The group T of 
rotations of a tetrahedron，for example，operates on a three-dimensional space V by 
rotations. We didn’t write down the matrices which represent this action explicitly in 
Chapter 5; let us do so now. A natural choice of basis has the coordinate axes pass¬ 
ing through the midpoints of three of the edges，as illustrated below: 



(1.1) Figure • 
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Let yi E T denote the rotation by 7r around an edge，and let x E T denote rotation 
by 27 t/ 3 around the front vertex. The matrices representing these operations are 



一 1 " 


一 -I " 


-1 

(1.2) = 

- 1 

_ 1 

2 = 

1 

1 

_ -1 — 

，心 3 = 

-1 

_ 1_ 


Rx = 1 


The rotations { yi ? x} generate the group 7, and the matrices {R yi ? R x } generate an iso¬ 
morphic group of matrices. 

It is also easy to write down matrices which represent the actions of C n , D n ， 
and O explicitly, but / is fairly complicated. 

An n-dimensional matrix representation of a group G is a homomorphism 

(1-3) R: G — >GL n (F), 

where F is a field We will use the notation R g for the image of g. So each R g is an 
invertible matrix, and multiplication in G carries over to matrix multiplication; that 
is, R g h = RgRh- The matrices (1,2) describe a three-dimensional matrix representa¬ 
tion of T. It happens to be faithful, meaning that R is an injection and therefore maps 
T isomorphically to its image, a subgroup of GL 3 ([R). Matrix representations are not 
required to be faithful. 

When we study representations, it is essential to work as much as possible 
without fixing a basis，and to facilitate this, we introduce the concept of a represen¬ 
tation of a group on a finite-dimensional vector space V. We denote by 

(1.4) GL(V) 

the group of invertible linear operators on V, the multiplication law being, as always, 
composition of functions. The choice of a basis of V defines an isomorphism of this 
group with the group of invertible matrices: 

(1.5) GL(V) ― > GL n (F) 

T matrix of T. 

By a representation of G on V, we mean a homomorphism 

(1.6) p: G — >GL(V). 

The dimension of the representation p is defined to be the dimension of the vector 
space K We will study only representations on finite-dimensional vector spaces. 

Matrix representations can be thought of as representations of G on the space 
F n of column vectors. 

Let p be a representation. We will denote the image of an element g in GL (V) 
by p g . Thus p g is a linear operator on V, and p g h = pgph^ If a basis B = (t；i ， …， 
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is given，the representation p defines a matrix representation R by the rule 

(1.7) R g = matrix of p g . 

We may write this matrix symbolically, as in Chapter 4 (3.1) ? as 

( 1 . 8 ) = 

If X is the coordinate vector of a vector v E ： V, that is, if t; = BX, then 

(1.9) R g X is the coordinate vector of p g (v). 

The rotation groups are examples of representations on a real vector space V 
without regard to a choice of basis. The rotations are linear operators in GL (V). In 

(1.1) we chose a basis for V"， thereby realizing the elements of T as the matrices 

(1.2) and obtaining a matrix representation. 

So all representations of G on finite-dimensional vector spaces can be reduced 
to matrix representations if we are willing to choose a basis. We may need to choose 
one in order to make explicit calculations，but then we must study what happens 
when we change our basis，which properties are independent of the choice of basis ， 
and which choices are the good ones ， 

A change of basis in V given by a matrix P changes a matrix representation R 
to a conjugate representation R f = PRP~\ that is, 

(L10) R g f = PR g P~ l for every g. 

This follows from rule (3.4) in Chapter 4 for change of basis. 

There is an equivalent concept, namely that of operation of a group G on a 
vector space K When we speak of an operation on a vector space, we always mean 
one which is compatible with the vector space structure — otherwise we shouldn’t be 
thinking of V as a vector space. So such an operation is a group operation in the 
usual sense [Chapter 5 (5.1)]: 

(1.11) Iv = v and (gh)v = g(hv )， 

for all g, h E G and all v S V. In addition，every group element is required to act 
on V as a linear operator • Writing out what this means，we obtain the rules 

(1.12) g{v + v r ) = gv + gv f and g(cv) — cgv 

which, when added to (1 • 11)，give a complete list of axioms for an operation of G on 
the vector space V. Since G does operate on the underlying set of V"， we can speak of 
orbits and stabilizers as before. 

The two concepts “operation of G on V" and “representation of G on V" are 
equivalent for the same reason that an operation of a group G on a set S is equivalent 
to a permutation representation (Chapter 5, Section 8): Given a representation p of 
G on V, we define an operation by the rule 

(113) gv = p g (v), 
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and conversely，given an operation, the same formula can be used to define the oper¬ 
ator p g for all g G G. It is a linear operator because of (1.12)，and the associative 
law (1.11) shows that p g ph = pgh* 

Thus we have two notations (L13) for the action of g on v, and we will use 
them interchangeably. The notation gv is more compact，so we use it when possible. 

In order to focus our attention，and because they are the easiest to handle，we 
will concentrate on complex representations for the rest of this chapter. Therefore 
the vector spaces V which occur are to be interpreted as complex vector spaces, and 
GL n will denote the complex general linear group GL n (C). Every real matrix repre¬ 
sentation, such as the three-dimensional representation (1.2) of the rotation group 
T ，can be used to define a complex representation，simply by interpreting the real 
matrices as complex matrices. We will do this without further comment. 

G-IJWARIANT FORMS AND UNITARY REPRESENTATIONS 

A matrix representation R: G - > GL n is called unitary if all the matrices R g are uni¬ 

tary, that is, if the image of the homomorphism R is contained in the unitary group. 
In other words, a unitary representation is a homomorphism 

(2.1) R: G — >Un 
from G to the unitary group. 

In this section we prove the following remarkable fact about representations of 
finite groups. 

(2.2) Theorem. 

(a) Every finite subgroup of GL n is conjugate to a subgroup of U n 、 

(b) Every matrix representation R: G - > GL n of a finite group G is conjugate to a 

unitary representation. In other words，given R, there is a matrix P E GL n 
such that PR g P~ x E U n for every g G G. 

(2.3) Corollary. 

(a) Let A be an invertible matrix of finite order in GL n ，that is, such that A r = / 
for some r. Then A is diagonalizable: There is aP E GL n so that PAP~ l is diag¬ 
onal. 

(b) Let R: G - ^ GL n be a representation of a finite group G. Then for every 

g E ： G, R g is a. diagonalizable matrix. 

Proof of the corollary • (a) The matrix A generates a finite subgroup of GL n . 
By Theorem (2.2), this subgroup is conjugate to a subgroup of the unitary group. 
Hence A is conjugate to a unitary matrix • The Spectral Theorem for normal operators 
[Chapter 7 (7,3)] tells us that every unitary matrix is diagonalizable. Hence A is 
diagonalizable. 
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(b) The second part of the corollary follows from the first，because every ele¬ 
ment g of a finite group has finite order‘ Since is a homomorphism, R g has finite 
order too* □ 

The two parts of Theorem (2.2) are more or less the same. We can derive (a) 
from (b) by considering the inclusion map of a finite subgroup into GL n as a matrix 
representation of the group. Conversely ，（ b) follows by applying (a) to the image 
of R. 

In order to prove part (b), we restate it in basis-free terminology. Consider a 
hermitian vector space V (a complex vector space together with a positive definite 
hermitian form 〈， >)_ A linear operator T on V is unitary if (v,w) = (T(o) ? T(w)) for 
all v,w E ： V [Chapter 7 (5.2)]. Therefore it is natural to call a representation 
p ： G - >GL (V) unitary if is a unitary operator for all g E ： G, that is，if 

(2.4) (v,w) = (p g (v),p g (w)), 

for all v,w ^ V and all ^ E G. The matrix representation R (1.7) associated to a 
unitary representation p will be unitary in the sense of (2.1), provided that the basis 
is orthonormal. This follows from Chapter 7 (5.2b)- 

To simplify notation，we will write condition (2.4) as 

(2.5) (v,w) = (gv,gw). 

We now turn this formula around and view it as a condition on the form instead of 
on the operation. Given a representation p of G on a vector space V, a form {,) on V 
is called G-invariant if (2.4)，or equivalently, (2.5) holds ‘ 

(2.6) Theorem • Let p be a representation of a finite group G on a complex vector 
space V. There exists a G-invariant, positive definite hermitian form {,) on V. 

Proof• We start with an arbitrary positive definite hermitian form on V; say 
we denote it by {，}• We will use this form to define a G-invariant form, by averag¬ 
ing over the group. Averaging over G is a general method which will be used again. 
It was already used in Chapter 5 (3.2) to find a fixed point of a finite group operation 
on the plane. The form < ，〉 we want is defined by the rule 

(2.7) (v f w) = ^ 2 { 糾， M ， 

where tv = \G\ is the order of G. The normalization factor l/iv is customary but 
unimportant. Theorem (2*6) follows from this lemma: 

(2.8) Lemma « The form (2.1) is hermitian, positive definite, and G-invariant* 

Proof. The verification of the first two properties is completely routine • For ex¬ 
ample, 

{gv,g(w + w f )} = {gv, gw + gw f } = {gv, gw} + {gv, gw f }. 



312 


Group Representations Chapter 9 


Therefore 

1 \飞 I I 

{v,w ^ w r ) = {gv 9 g(w + wO} = - E gw} + - S {gv,gw f } 

jy g^G ^ g^G Iy g^G 

= {v, w) + {v, w ; ). 

To show that the form 〈，〉 is G-invariant，let 尽 0 be an element of G. We must 
show that (g 0 v ， gow) = {v,w) for all v, w E V. By definition, 

(goV, gow) = ^ 2 {ggoV, ggow}. 

g^G 

There is an important trick for analyzing such a summation, based on the fact that 

right multiplication by 办 is a bijective map G - ^ G. As g runs over the group, the 

products do too，in a different order. We change notation, substituting g f for 
ggo^ Then in the sum, g f runs over the group. So we may as well write the sum as 
being over g f E ： G rather than over g G G. This merely changes the order in which 
the sum is taken. Then 


(goV.gow) = - z {^ot；, gg 0 w} = - 2 

iv g^O yv g f ^G 

as required. Please think this reindexing trick through and understand it, □ 


Theorem (2.2) follows easily from Theorem (2.6). Any homomorphism 

R: G - >GL n is the matrix representation associated to a representation (with 

V = C n and B = E)* By Theorem (2*6) s there is a G-invariant form ( ? ) on V, and 
we choose an orthonormal basis for V with respect to this form. The matrix repre¬ 
sentation R r obtained via this basis is conjugate to R (1.10) and unitary [Chapter 7 
(5.2)]. □ 


(2.9) Example. The matrix A 


-1 


1 

0J 


has order 3, and therefore it defines a 


i— j 

matrix representation {l,A,A 2 } of the cyclic group G of order 3. The averaging pro¬ 
cess (2,7) will produce a G-invariant form from the standard hermitian product X^Y 
on C 2 . It is 


( 2 . 10 ) 

where 

( 2 . 11 ) 


(x 9 y) = - [x*r + (ax)*(af) + (a 2 x)^(a 2 y)] = x*bx ， 


B 


[/+a*a+(a 2 )*(a 2 )] 


2 


3. COMPACT GROUPS 

A linear group is called compact if it is a closed and bounded subset of the space of 
matrices [Appendix (3.8)]. The most important compact groups are the orthogonal 
and unitary groups: 
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(3,1) Proposition. The orthogonal and unitary groups are compact. 

Proof • The columns of an orthogonal matrix P form an orthonormal basis, so 
they have length L Hence all of the matrix entries have absolute value <L This 
shows that O n is contained in the box defined by the inequalities [ | < L So it is a 

bounded set* Because it is defined as the common zeros of a set of continuous func¬ 
tions, it is closed too, hence compact. The proof for the unitary group is the same. □ 


The main theorems (2.2, 2.6) of Section 2 carry over to compact linear groups 
without major change. We will work out the case of the circle group G = S0 2 as an 
example. The rotation of the plane through the angle 0 was denoted by p$ in Chapter 
5. Here we will consider an arbitrary representation of G. To avoid confusion, we 
denote the element 


(3.2) 


cos 6 -sin 0 
sin 0 cos $ 

■ 


E S0 2 


by its angle 0, rather than by pe. Formula (3.2) defines a particular matrix represen¬ 
tation of our group，but there are others. 

Suppose we are given a continuous representation of G on a finite-dimen¬ 
sional space V, not necessarily the representation (3.2). Since the group law is addi¬ 
tion of angles, the rule for working with a is cr 0 +T? = To say that the opera¬ 

tion is continuous means that if we choose a basis for V， thereby representing the 
operation of 0 on V by some matrix Sq ，then the entries of S are continuous func¬ 
tions of 6. 

Let us try to copy the proof of (2-6). To average over the infinite group G，we 
replace summation by an integral. We choose any positive definite hermitian form 
{,} on V and define a new form by the rule 

1 广 

(3.3) {v, w) ^ — {a e v, crew] d6. 


This form has the required properties. To check G-invariance, fix any element 
do G G, and let rj — 6 + Bq. Then dr\ — dd. Hence 


^27T 


(3.4) 


{(Te 0 v,a6 0 w)= 


2tt 


{(T6(Td 0 V, (T0(T0 o w} d6 


r 0 


-27T 


2tt 


{o-jjV, a v w} dr\ = (u 5 w), 


0 


as required • 

We will not carry the proof through for general groups because some serious 
work has to be done to find a suitable volume element analogous to d6 in a given 
compact group G. In the computation (3.4)，it is crucial that d6 = d(6 + do), and 
we were lucky that the obvious integral was the one to use. 

For any compact group G there is a volume element dg called Haar measure ， 
which has the property of being translation invariant: If go E ： G is fixed and 
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8 f = ggo, then 

(3.5) dg = dg r . 

Using this measure，the proof carries over. We will not prove the existence of a 
Haar measure, but assuming one exists, the same reasoning as in (2.8) proves the 
following analogue of (2,6) and (2.2): 

(3.6) Corollary* Let G be a compact subgroup of GL n . Then 

(a) Let (j be a representation of G on a finite-dimensional vector space V. There is 
a G-invariant, positive definite hermitian form {,) on V. 

(b) Every continuous matrix representation R of G is conjugate to a unitary repre¬ 
sentation. 

(c) Every compact subgroup G of GL n is conjugate to a subgroup of U n . □ 


4. G-ESVARIANT SUBSPACES AND IRREDUCIBLE REPRESENTATIONS 

Given a representation of a finite group G on a vector space V, Corollary (2.3) tells 
us that for each group element g there is a basis of V so that the matrix of the opera¬ 
tor p g is diagonal- Obviously, it would be very convenient to have a single basis 
which would diagonalize for all group elements g at the same time- But such a ba¬ 
sis doesn’t exist very often，because any two diagonal matrices commute with each 
other. In order to diagonalize the matrices of all p g at the same time，these operators 
must commute. It follows that any group G which has a faithful representation by di¬ 
agonal matrices is abelian. We will see later (Section 8) that the converse is also 
true. If G is a finite abelian group，then every matrix representation /? of G is diago- 
nalizable; that is，there is a single matrix p so that PR g P~ l is diagonal for all g E ： G. 
In this section we discuss what can be done for finite groups in general. 

Let p be a representation of a group G on a vector space V. A subspace of V is 
called G-invariant if 

(4.1) gw E W, for all w E W and g S G. 

So the operation by every group element g must carry W to itself, that is ，gW C W. 
This is an extension of the concept of T-invariant subspace introduced in Section 3 
of Chapter 4. In a representation, the elements of G represent linear operators on V, 
and we ask that W be an invariant subspace for each of these operators. If W is 
G-invariant，the operation of G on V will restrict to an operation on W. 

As an example，consider the three-dimensional representation of the dihedral 
group defined by the symmetries of an n-gonA [Chapter 5 (9.1)]. So G = 
There are two proper G-invariant sub spaces: The plane containing A and the line 
perpendicular to A. On the other hand, there is no proper T-invariant subspace for 
the representation (L2) of the tetrahedral group T, because there is no line or plane 
which is carried to itself by every element of T. 
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If a representation p of a group G on a nonzero vector space V has no proper 
G-invariant subspace, it is called an irreducible representation. If there is a proper 
invariant subspace, p is said to be reducible• The standard three-dimensional repre¬ 
sentation of T is irreducible. 

When V is the direct sum of G-invariant subspaces: 1/ = ㊉ 呢 ， the repre¬ 

sentation p on V is said to be the direct sum of its restrictions pi to Wt, and we write 

(4.2) p 二 Pi©P2. 

Suppose this is the case. Choose bases Bi ? B 2 of Wx.Wi, and let B = (Bi ， B 2 ) be the 
basis of V obtained by listing these two bases in order [Chapter 3 (6.6)]. Then the 
matrix R g of p g will have the block form 



R g = 


A g 

0 

0 

Bg 


where A g is the matrix of p\ g with respect to Bi and B g is the matrix of p 2g with re¬ 
spect to B 2 . Conversely，if the matrices R g have such a block form, then the repre¬ 
sentation is a direct sum. 

For example, consider the rotation group G — D n operating on W by sym¬ 
metries of an «-gonA. If we choose an orthonormal basis B so that is perpendicu¬ 
lar to the plane of A and V 2 passes through a vertex ， then the rotations corresponding 
to our standard generators x, y [Chapter 5 (3.6)] are represented by the matrices 

-1 

i , 

-1 


(4.4) 


Rx 


~1 

Cn . 

~S n 

z> — 

， K y ~ 

Sn 

Cn 



where c n = cos (2tt jn) and s n ~ sin (2tt/ n ). So /? is a direct sum of a one-dimen¬ 
sional representation A ? 

(4.5) 1 = [1] ， A，= [-1 ]， 
and a two-dimensional representation B, 

— ■wj ■ 

/ A _ C n 一 Sn 1 

(4.6) B X = ， By = 

Sn C n _ 1 


The representation B is the basic two-dimensional representation of D n as sym¬ 
metries of A in the plane. 

On the other hand，even if a representation p is reducible, the matrices R g will 
not have a block form unless the given basis for V is compatible with the direct sum 
decomposition. Until we have made a further analysis, it will be difficult to tell that 
a representation is reducible, when it is presented using the wrong basis. 


(4,7) Proposition. Let p be a unitary representation of G on a hermitian vector 
space V, and let W be a G-invariant subspace. The orthogonal complement is 
also G-invariant, and p is a direct sum of its restrictions to W and 
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Proof• Let v E so that d 丄 W, Since the operators p g are unitary, they 
preserve orthogonality [Chapter 7 (5,2)], so gv 1 gW. Since W is G-invariant, 
W = 发撕， so 奸丄 W, Therefore gv E W ± . This shows that is G-invariant. We 
know that V = W ㊉ W 丄 by Chapter 7 (2.7). □ 

This proposition allows us to decompose a representation as a direct sum, pro¬ 
vided that there is a proper invariant subspace. Together with induction, this gives us 
the following corollary: 

(4.8) Corollary. Every unitary representation p: G - >GL(V) on a hermitian 

vector space V is a direct sum of irreducible representations. □ 

Combining this corollary with (2.2), we obtain the following: 

(4.9) Corollary* Maschke’s Theorem: Every representation of a finite group G is a 
direct sum of irreducible representations. □ 


5. CHARACTERS 


Two representations p: G - >GL(V) and p f : G - >GL{V f ) of a group G are 

called isomorphic ， or equivalent ， if there is an isomorphism of vector spaces 
T: V - >V f which is compatible with the operation of G: 

(5.1) gT (v) = T (gv) or p g r T(v) = T{p g (v)) f 

for all v E V and g E G. If B is a basis for V and if 二 T (B) is the corresponding 
basis of V r , then the associated matrix representations R g and R g f will be equal. 

For the next four sections, we restrict our attention to representations of finite 
groups. We will see that there are relatively few isomorphism classes of irreducible 
representations of a finite group. However, each representation has a complicated 
description in terms of matrices. The secret to understanding representations is not 
to write down the matrices explicitly unless absolutely necessary. So to fecilitate 
classification we will throw out most of the information contained in a representation 
p, keeping only an essential part. What we will work with is the trace, called the 
character, of p. Characters are usually denoted by x - 

The character ^ of a representation p is the map G - > C defined by 

(5.2) x(g) = trace(pg). 

If R is the matrix representation obtained from p by a choice of basis for V, then 

(5.3) 尤 ( 犮） =trace(/? g )= 入十 "+ 入„， 

where A, are the eigenvalues of R g , or of p g . 

The dimension of a character x ls defined to be the dimension of the repre¬ 
sentation p. The character of an irreducible representation is called an irreducible 
character ， 
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Here are some basic properties of the character: 

(5.4) Proposition. Let 义 be the character of a representation p of a finite group G 
on a vector space V. 

(a) ^(1) is the dimension of the character [the dimension of V]. 

(b) x(s) = l ) for all g, h E G. In other words，the character is constant on 

each conjugacy class. 

(c) x(g~ l ) = X(g) [the complex conjugate of 尤（发 )]. 

(d) If f is the character of another representation p f ， then the character of the 

direct sum p®p ; is ^ 

Proof • The symbol 1 in assertion (a) denotes the identity element of G. This 
property is trivial: ^(1) = trace I = dim V, Property (b) is true because the 
matrix representation R associated to p is a homomorphism, which shows that 
Rh g h l = RhR g Rh \ and because trace (RhR g Rh~ l ) = trace R g [Chapter 4 (4.18)]. 
Property (d) is also clear，because the trace of the block matrix (4.3) is the sum of 
the traces of A g and B g . 

Property (c) is less obvious. If the eigenvalues of R g are Ai,..., then the 
eigenvalues of R g -i = (Rg)— 1 are Ar 1 ， … ，入 \ The assertion of (c) is 

欠 ( 发 _1 ) = A,~ l + = Xi + … +L = 义 ( 穹)， 

and to show this we use the fact that G is a finite group. Every element g of G has 
finite order. If = 1, then R g is a matrix of order r, so its eigenvalues Ai ， … ， A/j are 
roots of unity. This implies that | A, | = 1， hence that A,' 1 = A, for each A □ 

In order to avoid confusing cyclic groups with conjugacy classes, we will de¬ 
note conjugacy classes by the roman letter C, rather than an italic C, in this chapter. 
Thus the conjugacy class of an element g E ： G will be denoted by C g . 

We shall note two things which simplify the computation of a character. First 
of all，since the value of x depends only on the conjugacy class of an element g E ： G 
(5.4b), we need only determine the values of x on one representative element in 
each class. Second, since the value of the character x(g) is the trace of the operator 
p g and since the trace doesn’t depend on the choice of a basis, we are free to choose 
a convenient one. Moreover, we may select a convenient basis for each individual 
group element. There is no need to use the same basis for all elements. 

As an example, let us determine the character 尤 of the rotation representation 
of the tetrahedral group T defined by (1.2). There are four conjugacy classes in T, 
and they are represented by the elements l,x,x 2 ,y, where as before x is a rotation 
by 277/3 about a vertex and j is a rotation by 77 about the center of an edge. The 
values of the character on these representatives can be read off from the matrices 
(1-2): 

(5.5) 尤 ⑴ = 3 ， x(x) = 0, x(x 2 ) = 0, 尤 00 = -L 

It is sometimes useful to think of a character ^ as a vector. We can do this by 
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listing the elements of G in some order: G = {gu... y g N }; then the vector represent¬ 
ing x will be 

(5.6) ^ = (义(穹1),〜，尤(办))、 

Since 尤 is constant on conjugacy classes，it is natural to list G by listing the con- 
jugacy classes and then running through each conjugacy class in some order. If we 
do this for the character (5.5), listing C】，Qc，Cp，Qy in that order, the vector we ob¬ 
tain is 

(5.7) ^ = (3;0,0,0,0;0,0,0,0 ; — 1 ， -1 ， -1 广 

We will not write out such a vector explicitly again. 

The main theorem on characters relates them to the hermitian dot product on 
This is one of the most beautiful theorems of algebra, both because its statement 
is intrinsically so elegant and because it simplifies the problem of classifying repre¬ 
sentations so much. We define 

( 5 . 8 ) (x,X r ) = ^ ⑼， 

g 

where - | G |. If ' are represented by vectors as in (5,7), this is the standard 
hermitian product, renormalized by the fector 1/n. 

(5.9) Theorem* Let G be a group of order w，let represent the distinct 

isomorphism classes of irreducible representations of G, and let $ be the character 
of pi. 

(a) Orthogonality Relations: The characters are orthonormal. In other words， 
(XiyXj) = o if i 幻 ， and (xi^xd = 1 for each i, 

(b) There are finitely many isomorphism classes of irreducible representations, the 
same number as the number of conjugacy classes in the group. 

(c) Let di be the dimension of the irreducible representation p “ and let r be the 
number of irreducible representations. Then di divides n, and 

(5.10) N — di 2 ••- + d r 2 . 

This theorem will be proved in Section 9, with the exception of the assertion 
that di divides n, which we will not prove. 

A complex-valued function <p\ G - >C which is constant on each conjugacy 

class is called a class function. Since a class function is constant on each class，it 
may also be described as a function on the set of conjugacy classes. The class func¬ 
tions form a complex vector space, which we denote by % . We use the form defined 
by (5.8) to make ^ into a hermitian space. 


(5.11) Corollary. The irreducible characters form on orthonormal basis of 
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This follows from (5.9a ， b)_ The characters are linearly independent because 
they are orthogonal，and they span because the dimension of % is the number of 
conjugacy classes，which is r. □ 

The corollary allows us to decompose a given character as a linear combina¬ 
tion of the irreducible characters, using the formula for orthogonal projection 
[Chapter 7 (3.8)]. For let ^ be the character of a representation p. By Corollary 
(4.9), p is isomorphic to a direct sum of the irreducible representations pi, … ， p r ; 
say we write this symbolically as p = 叫化 ㊉…㊉ 〜 p r ，where are nonnegative 
integers and where np stands for the direct sum of n copies of the representation p. 
Then x = — Since ( 尤 1 ， … ，； tv) is an orthonormal basis, we have the 

following: 

(5.12) Corollary. Let 尤 i ， …，; be the irreducible characters of a finite group G ， 

and let ^ be any character. Then x = - ^ n r Xr , where m = ° 

(5.13) Corollary* If two representations p ， p f have the same character, they are 
isomorphic. 

For let x^X f be the characters of two representations p ， p'，where 
p = wi p] ㊉…㊉ and p f = ra/pi ㊉…㊉ Then the characters of these 
representations are x = + and^' = Xr^ Since 

are linearly independent, x ~ X f implies that m = m f for each i. □ 

(5.14) Corollary* A character x has the property 〈太 ，％> = 1 if and only if it is ir¬ 
reducible. 

For if ^ = «i^H - y-n r Xr , then {x^x) = 屮 2 ~^— +«r 2 . This gives the value 1 

if and only if a single m is 1 and the rest are zero* □ 

The evaluation of 〈尤，太 〉 is a very practical way to check irreducibility of a rep¬ 
resentation* For example, let 尤 be the character (5.7) of the representation (1.2). 
Then (x,x) ~ ( 3 2 +1 + 1 + 1)/12 = 1 . So ^ is irreducible. 

Part (c) of Theorem (5.9) should be contrasted with the Class Equation 
[Chapter 6 (1.7)]. Let C“ … ， C r be the conjugacy classes in G ? and let o = |C，| be 
the order of the conjugacy class. Then o divides n, and /v = Ci + … + c r . Though 
there is the same number of conjugacy classes as irreducible representations, their 
exact relationship is very subtle• 

As our first example, we will determine the irreducible representations of the 
dihedral group £>3 [Chapter 5 (3.6)]- There are three conjugacy classes，Ci = { 1 }， 
C2 = {y,xy,x 2 y}, C3 = {x,x 2 } [Chapter 6 ( 1 . 8 )]，and therefore three irreducible 
representations. The only solution of equation (5.10) is 6 = l 2 + l 2 + 2 2 , so D 3 has 
two one-dimensional representations p u p 2 and one irreducible two-dimensional rep¬ 
resentation p 3 ‘ Every group G has the trivial one-dimensional representation 
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(R g = 1 for all 发)； let us call it pi. The other one-dimensional representation is the 
sign representation of the symmetric group 5 ^， which is isomorphic to D 3 : 
R g = sign (^) = ±L This is the representation (4.5); let us call it p 2 ^ The two- 
dimensional representation is defined by (4.6); call it p 3 - 

Rather than listing the characters xt as vectors, we usually assemble them into 
a character table. In this table, the three conjugacy classes are represented by the 
elements \,y y x. The orders of the conjugacy classes are given above them. Thus 
C^[ = 3. 


conjugacy 

class 


(1) ⑶⑵ 




\ 1 / 

1 



irreducible 

^1 

1 

1 

1 

character 


1 

一 1 

1 


太 3 

2 

0 

-1 


order of the class 
representative element 


value of the 
character 


(5,15) CHARACTER TABLE FOR D 3 


In such a table, the top row，corresponding to the trivial character, consists en¬ 
tirely of Vs. The first column contains the dimensions of the representations, be¬ 
cause ^( 1 ) = dim pi ， 

To evaluate the bilinear form (5.8) on the characters, remember that there are 
three elements in the class of y and two in the class of jc. Thus 

〈尤 3，义3〉 = ^ 2 ^3(^3(^) = (1 * fc ( l )； D ( l )) + 3 * (^3(>0太3(30) + 2 _ (^3 W ^ 3 U )))/6 

s 

= (1.2*2 + 3*0*0 4 - 2_(-l)*( _ l ))/6 = 1 • 

This confirms the fact that p 3 is irreducible. □ 


As another example, consider the cyclic group C 3 = {\ 9 x,x 2 } of order 3. 
Since C 3 is abelian, there are three conjugacy classes, each consisting of one ele¬ 
ment. Theorem (5.9) shows that there are three irreducible representations, and that 
each has dimension L Let ( = 1(—1 + V3/) be a cube root of 1. The three repre¬ 
sentations are 

( 5 」 6 ) P^ x = ^ P2 X = i ， ps x = i 2 . 



l X X 2 

XI 

1 

1 

1 

X 2 

1 


c 2 

义 3 

1 


c 


(5,17) CHARACTER TABLE FOR C 3 


Note that X — f 2 - So 
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(5.18) CHARACTER TABLE FOR T 

Various properties of the group can be read off easily from the character table. 
Let us forget that this is the character table for T, and suppose that it has been given 
to us as the character table of an unknown group G. After all，it is conceivable that 
another isomorphism class of groups has the same characters. 

The order of G is 12， the sum of the orders of the conjugacy classes• Next, 
since the dimension of p 2 is 1, x^iy) is the trace of the 1 x 1 matrix p 2y . So the feet 
that xi(y) = 1 shows that p 2y = 1 too, that is，that j is in the kernel of In fact ， 
the kernel of p 2 is identified as the union of the two conjugacy classes C\ U C y . This 
is a subgroup H of order 4 in G. Moreover, H is the Klein four group. For if H were 
Ca, its unique element of order 2 would have to be in a conjugacy class by itself. It 
also follows from the value of xi{x) that the order of x is divisible by 3. Going back 
to our list [Chapter 6 (5.1)] of groups of order 12, we see that G ~ T. 

6. PERMUTATION REPRESENTATIONS AND 
THE REGULAR REPRESENTATION 

Let X be a set. We can construct a representation of a group G from an operation of 
G on S, by passing to the vector space V = V(S) of formal linear combinations 
[Chapter 3 (3.21)] 

t) = X a i S h A/ G C. 


〈尤 2 , 义 3> = (1 • 1 + = (1 + ( + ( 2 )/3 = 0 ， 

which agrees with the orthogonality relations. 

As a third example, let us determine the character table of the tetrahedral 
group T. The conjugacy classes CuC x ,C x 2 ,C y were determined above, and 
the Class Equation is 12 = 1+4+4+3. The only solution of (5.10) is 
12 = l 2 +l 2 +l 2 +3 2 , so there are four irreducible representations, of dimensions 
1, 1, 1 ， 3. Now it happens that T has a normal subgroup H of order 4 which is iso¬ 
morphic to the Klein four group: and such that the quotient T = T/H is cyclic of or¬ 
der 3. Any representation ~p of T will give a representation of T by composition: 

r—^ T-^GL(V). 

Thus the three one-dimensional representations of the cyclic group determine repre¬ 
sentations of T. Their characters can be determined from (5.17). The char¬ 

acter (5.5) is denoted by 办 in the table below. 
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An element g 6 G operates on vectors by permuting the elements of S, leaving the 
coefficients alone: 


( 6 - 1 ) 


8^ = 


2 


aigsi 


4 


If we choose an ordering s u ^.,s n ofS and take the basis ( 心 ， … ，知 ) for V， then 
is the permutation matrix which describes the operation of 穹 on & 

For example, let G = 7 and let S be the set of faces of the tetrahedron: 
S = (j \， …， f4)- The operation of G on 5 defines a four-dimensional representation 
of G. Let ;c denote the rotation by 2tt/3 about a face/i and y the rotation by tt about 
an edge as before. Then if the faces are numbered appropriately, we will have 


( 6 . 2 ) 


Rx = 


1 

0 

0 

o' 


0 

1 

0 

o' 

0 

0 

0 

1 

and R y = 

1 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 


0 

■ 

0 

1 

0 


We will call p (or R) the representation associated to the operation of G on S 
and will often refer to p as a permutation representation, though that expression has 
a meaning in another context as well (Chapter 5, Section 8). 

If we decompose a set on which G operates into orbits, we will obtain a de¬ 
composition of the associated representation as a direct sum. This is clear. But there 
is an important new feature: The fact that linear combinations are available in 
V(S) allows us to decompose the representation further. Even though S may consist 
of a single orbit, the associated permutation representation p will never be irre¬ 
ducible, unless S has only one element. This is because the vector w =〜+ …+ s r 
is fixed by every permutation of the basis, and so the one-dimensional subspace 
W = {cw} is G-invariant. The trivial representation is a summand of every permuta¬ 
tion representation. 

It is easy to compute the character of a permutation representation: 


(6-3) 


x(g) = number of elements of S fixed by g, 


because for every index fixed by a permutation, there is a 1 on the diagonal of the 
associated permutation matrix, and the other diagonal entries are 0. For example, 
the character 尤 of the representation of T on the faces of a tetrahedron is 


(6.4) 



1 X 

x 1 


AT 

4 1 

1 

0 


and the character table (5.18) shows that X = X\ ^ Therefore p — pi®p 4 by 
Corollary (5.13). As another example, the character of the operation of T on the six 
edges of the tetrahedron is 


(6.5) 



1 JC 

jc 2 


X 

6 0 

0 

2 


and using (5,18) again，we find that x ^ X\ +^ + ^3 + ^ 4 . 

The regular representation p reg of G is the representation associated to the op- 
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eration of G on itself by left multiplication. In other words，we let S = G ，with the 
operation of left multiplication. This is not an especially interesting operation, but its 
associated representation is very interesting. Its character 义 reg is particularly simple: 

(6.6) ^ reg (l) = n, and ^ reg (g) = 0, if g 1, 

where n = G t The first formula is clear: ^(1) = dim p for any representation p, and 
p reg has dimension N. The second follows from (6-3)，because multiplication by g 
does not fix any element of G, unless ^ = L 

Because of this formula，it is easy to compute (^ reg ,^) for the character x °f 
any representation p by the orthogonal projection formula (5.12). The answer is 

(6.7) (x reg , x) = dimp, 

because ^(1) = dim p. This allows us to write x reg as a linear combination of the 
irreducible characters: 

(6.8) Corollary* ^ reg = d\X\ + + d r Xr, and p reg — Apt ㊉…㊉ Apr，where 

di is the dimension of pi and dtpi stands for the direct sum of dk copies of pi. a 

Isn’t this a nice formula? We can deduce formula (5.10) from (6,8) by count¬ 
ing dimensions. This shows that formula (5.10) of Theorem (5.9) follows from the 
orthogonality relations. 

For instance, for the group Z) 3 , the character of the regular representation is 

l x y 

x reg I 6 0 0 ， 

and Table (5,15) shows that x TCg = Afi + + as expected. 

As another example, consider the regular representation R of the cyclic group 
{1 ， x ， jc 2 } of order 3. The permutation matrix representing x is 

" r 

Rx = 1 . 

_ 1 _ 

Its eigenvalues are 1，【， （ 2 , where I = ^(-1 + V3/) + Thus R x is conjugate to 

1 

Rx = I • 

_ 

This matrix displays the decomposition p rcg — pi©p 2 ㊉ P 3 of the regular represen¬ 
tation into irreducible one-dimensional representations. 

Z THE REPRESENTATIONS OF THE ICOSAHEDRAL GROUP 


In this section we determine the irreducible characters of the icosahedral group. So 
for, we have seen only its trivial representation p x and the representation of dimen¬ 
sion 3 as a rotation group. Let us denote the rotation representation by p 2 . There are 
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five conjugacy classes in I [Chapter 6 (2.2)]，namely 

(7.1) Ci = {1 }， 

C 2 = 15 rotations “jc” through the angle 77 ， 

C3 = 20 rotations “y” by 277/3, 477 / 3 , 

C 4 = 12 rotations “z” by 277/5, 8 订 /5 ， 

C 5 = 12 rotations “z 2 ” by 4tt/5, 67t/5 ， 

and therefore there are three more irreducible representations_ Given what we know 
already, the only solution to (5.10) is di = 1 ， 3, 3,4, 5: 

60 = l 2 +3 2 +3 2 +4 2 +5 2 . 

We denote the remaining representations by p 3 ， p 4 ， p 5 , where dimp 3 = 3, and 
so on. A good way to find the missing irreducible representations is to decompose 
some known permutation representations. We know that I operates on a set of five 
elements [Chapter 6 (2.6)]. This gives us a five-dimensional representation p' As 
we saw in Section 6, the trivial representation is a summand of p’. Its orthogonal 
complement turns out to be the required irreducible four-dimensional representation: 
p ' = pi ㊉ p 4 - Also, I permutes the set of six axes through the centers of opposite 
faces of the dodecahedron. Let the corresponding six-dimensional representation be 
〆’• Then 〆 =pi ㊉ p 5 . We can check this by computing the characters of p 4 and p 5 
and applying Theorem (5-9). The characters are computed from 义 ' ，义〃 by 

subtracting Yi 二 1 from each value (5,4d), For example, p f realizes jc as an even 
permutation of {1 ， … ， 5} of order 2, so it is a product of two disjoint transpositions, 
which fixes one index. Therefore x f ( x ) = U and y 4 (;c) = 0, 

The second three-dimensional representation p 3 is fairly subtle because it is so 
similar to p 2 . It can be obtained this way: Since I is isomorphic to ， we may view 
it as a normal subgroup of the symmetric group S 5 . Conjugation by an element p of 
S 5 which is not in A 5 defines an automorphism a of A 5t This automorphism inter¬ 
changes the two conjugacy classes C 4 , C 5 . The other conjugacy classes are not inter¬ 
changed, because their elements have different orders. For example, in cycle nota¬ 
tion, let z = (12345) and let p = (2354). Then p~ l zp = (4532)(12345)(2354)= 
(13524) = z 2 . The representation p 3 is p 2 °cr. 

The character of p 3 is computed from that of p 2 by interchanging the values for 
z 5 z 2 . Once these characters are computed, verification of the relations = 0 , 

{Xi^Xi) — 1 shows that the representations are irreducible and that our list is correct. 


⑴ 



1 

X 

/ 

y 

z 

z 2 


1 

1 

1 

1 

1 

； t2 

3 

-1 

0 

a 

)3 


3 

— 1 

0 


a 


4 

0 

1 

-1 

-1 

X5 

5 

1 

-1 

0 

0 


(7.2) CHARACTER TABLE FOR I = A 5 
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In this table, a is the trace of a three-dimensional rotation through the angle 2tt/5, 
which is 

a = 1 + 2 cos 2tt/5 = 士 (-1 + V5) ? 
and is computed similarly: j8 = 1 + 2 cos 4tt/5 = K~1 — V5). 

8. ONE-DIMENSIONAL REPRESENTATIONS 

Let p be a one-dimensional representation of a group G. So is a 1 x 1 matrix, and 
x(g) = Rgy provided that we identify a 1 x 1 matrix with its single entry. Therefore 

in this case the character ^ is a homomorphism G - > C x , that is, it satisfies the 

rule 

(8.1) x(gh) = x(g)x{h), if dim p = 1 . 

Such a character is called abelian. Please note that formula (8.1) is not true for char¬ 
acters of dimension > 1. 

If G is a finite group, the values taken on by an abelian character x are always 
roots of 1 : 

(8.2) 尤(# = 1 

for some r, because the element g has finite order. 

The one-dimensional characters form a group under multiplication of func¬ 
tions; 

(8.3) XX'(g) = ⑼. 

A 

This group is called the character group of G and is often denoted by G. The char¬ 
acter group is especially important when G is abelian, because of the following feet: 

(8-4) Theorem. If G is a finite abelian group, then every irreducible representa¬ 
tion of G is one-dimensional. 

Proof• Since G is abelian, every conjugacy class consists of one element. So 
the number of conjugacy classes is By Theorem (5.9)，there are yv irreducible rep¬ 
resentations, and d\ — di=-'—d r = \^ □ 

9. SCHUR 9 S LEMMA, AND PROOF OF THE 
ORTHOGONALITY RELATIONS 

Let p,p f be representations of a group G on two vector spaces V,V r , We will call a 
linear transformation T: V - >V f G-invariant if it is compatible with the two oper¬ 

ations of G on V and V\ that is，if 

(9.1) gT(v) = T(gv), or p/(r(i?)) = T(p g {v)\ 

for all g E G and v E ： V. Thus an isomorphism of representations (Section 5) is a 
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bijective G-invariant transformation. We could also write (9.1) as 

(9.2) p/ ° T = r o 〜， for all g E G. 

Let bases B ， for V and V r be given，and let R g ， R g r and A denote the matrices 
of pg,p g f and T with respect to these bases. Then (9.2) reads 

(9.3) R g f A = AR g , for all g G G, 

The special case that p = 〆 is very important. A G 釋 invariant linear operator 
T on V is one which commutes with p g for every g S G: 

(9.4) p g °T = T ° p g or R g A = AR g . 

These formulas just repeat (9,2) and (9.3) when p = p f . 

(9.5) Proposition. The kerneJ and image of a G-invariant linear transformation 
T: V ^ V f are G-invariant subspaces of V and V f respectively. 

Proof • The kernel and image of any linear transformation are subspaces. Let 
us show that ker T is G-invariant: We want to show that gv E ker T if v E ker T, 
or that T (gv) = 0 if T(v) = 0. Well, 

T (gv) = gT(v) = ^0 = 0. 

Similarly, if v f E im T, then v r = T(v) for some v E V. Then 

= gT(v) = T (gv) ， 

so gv r G im T too. □ 

(9.6) Theorem. Schur，s Lemma: Let p,p ; be two irreducible representations of G 

on vector spaces V 9 V\ and let T; V - > V be a G-invariant transformation. 

(a) Either T is an isomorphism，or else T = 0. 

(b) If V = V f and p — p \ then T is multiplication by a scalar. 

Proof, (a) Since p is irreducible and since ker r is a G-invariant subspace, 
ker T = V or else ker T = 0. In the first case, T = 0, In the second case, T is injec¬ 
tive and maps V isomorphically to its image. Then im T is not zero. Since p f is irre¬ 
ducible and im T is G -invariant, im T = V\ Therefore T is an isomorphism. 

(b) Suppose V = V\ so that T is a linear operator on V. Choose an eigenvalue 入 of 
T, Then (r — 入 /) = [ is also G-invariant. Its kernel is nonzero because it contains 
an eigenvector. Since p is irreducible, ker T\ = V, which implies that T x — 0. 
Therefore T = 入 /. □ 

The averaging process can be used to create a G -invariant transformation from 
any linear transformation T\ V - > V f . To do this, we rewrite the condition (9,1) in 
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the form T(v) = p g '~ l (T(p g (v)) y or 

(9-7) T(v) = g~ l (T(gv)). 

The average is the linear operator T defined by 

(9.8) 〜 ) =士2发，( 奸 ))， 

where = 」 G| as before. If bases for V, V f are given and if the matrices for 
Pg ， p g ’ ， T，f are R gy R g f , A, A respectively, then 

( 9 . 9 ) A = ^R g '~ l AR g . 

g 

Since compositions of linear transformations and sums of linear transformations are 
again linear，f is a linear transformation. To show that it is G-invariant, we fix an 
element h E G and let ^ r = gh. Reindexing as in the proof of Lemma (2.8), 

h x f{hv) = ^ 2 h~ l g~ l (T(ghv)) = ^ W。)）= T{v). 

8 8 f 

Therefore T(hv) = hT(v). Since h is arbitrary, this shows that T is G-invariant. □ 

_ It may happen that we end up with the trivial linear transformation, that is, 
7 = 0 though T was not zero. In fact ， Schur’s Lemma tells us that we must get 
T = 0 if p and p' are irreducible but not isomorphic. We will make good use of this 
seemingly negative fact in the proof of the orthogonality relations. 

When p = p \ the average can often be shown to be nonzero by using this 
proposition. 

(9.10) Proposition. Let p be a representation of a finite group G on a vector space 

V， and let T: V - > V be a linear operator. Define T by formula (9*8). Then 

trace T = trace T. Thus if the trace of T isn’t zero, then T is not zero either. 

Proof• We compute as in formula (9.9) ? with R f = R. Since trace A = 
trace Rg l AR g , the proposition follows. □ 

Here is a sample calculation* Let G = C 3 = {l,x, x 2 }, and let p = 〆 be the 
regular representation (Section 6 ) of G, so that V = C 3 and 

ro 0 r 

Rx = 10 0 . 

0 10 
I — ^ 

Let T be the linear operator whose matrix is 
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Then the matrix of T is 

B = UlBI + R x ~ l BR x + R x ~ 2 BR x 2 ) 

I 

1 1 2 1 ° 

=~(B + R x 2 BR x + R X BR X 2 ) =- 021 . 

10 2 
_ — I 

Or, let T be the linear operator whose matrix is the permutation matrix correspond¬ 
ing to the transposition y = (1 2). The average over the group is a sum of the three 
transpositions: (y + x~ l yx + x~ 2 yx)/3 = (y + xy + x 2 y)/3. In this case ， 



Note that B and P commute with R x as claimed [see (9,4)], though the original ma¬ 
trices P and B do not. 

We will now prove the orthogonality relations，Theorem (5.9a). We saw in 
Section 6 that formula (5,10) is a consequence of these relations* 

Let be two nonisomorphic irreducible characters，corresponding to rep¬ 
resentations p, p f of G on V,V r . Using the rule x f (s~ l ) = X f (s)^ we can rewrite 
the orthogonality {x 、 x) = 0 to be proved as 

( 9 . 11 ) X r (g^)x(g) = 0 - 

g 

Now Schur’s Lemma asserts that every G-invariant linear transformation V - >V r 

is zero. In particular, the linear transformation T which we obtain by averaging any 
linear transformation T is zero. Taking into account formula (9.9)，this proves the 
following lemma: 

(9.12) Lemma* Let R ， R f be nonisomorphic irreducible representations of G. Then 

S V 1 ' 碼 = o 

for every matrix A of the appropriate shape. □ 

Let’s warm up by checking orthogonality in the case that p and p f have di¬ 
mension L In this case, R g ，， R g r are 1 x 1 matrices, that is, scalars, and x(s) = R g- 
If we set A = 1, then except for the factor 1/n, (9.12) becomes (9.11), and we are 
done. 

Lemma (9.12) also implies orthogonality in higher dimensions, but only after a 
small computation. Let us denote the entries of a matrix M by as we did in 

Section 7 of Chapter 4* Then x(s) — trace/? g = 2, (R g )ij. So {x\x) expands to 

(9.13) ix^x) = ~ S S (Rg-^)u(Rg)jj- 

g 
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We may reverse the order of summation. So to prove that= 0, it suffices to 
show that for all / ， y ， 

(9.14) S (Rg-i’h(Rg)jj = 0. 

The proof of the following lemma is elementary: 


(9.15) Lemma. Let M,N be matrices and let P = Me a ^N, where e a ^ is a matrix 
unit of suitable size. The entries of P are ( 尸 )"= (M)i a {N)pj. □ 

We substitute etj for A in Lemma (9.12) and apply Lemma (9.15), obtaining 


0 = (0)y — 2 e ij R g)ij = S (Rg-l%(Rg)jj ， 

S g 

as required. This shows that (x^x) = 0 if x and are characters of nonisomor¬ 
phic irreducible representations. 

Next，suppose that x = X f - We have to show that (x^x) = 1- Averaging A as 
in (9.9) need not give zero now，but according to Schur’s Lemma, it gives a scalar 
matrix: 

(9.16) ™ S Rg-i 瑪 =A = al. 

g 

By Proposition (9.10), trace A = trace A, and trace A = da, where d = dim p. So 

(9.17) a = trace A/d. 

We set A = eij in (9.16) and apply Lemma (9,15) again, obtaining 


1 1 

(9.18) (al)ij = - S (Rg-iAR g )ij = - S (R g ]h ( 尺 gh ， 

8 g 

where a = (trace €i } ) / d. The left-hand side of (9.18) is zero if i ^ j and is equal to 
l/d if i = j. This shows that the terms with i ^ j in (9.13) vanish，and that 


(x^x) 


N 


S S (Rg-i)ii(Rg)u = S 


g 


Tr 2 (Rg-^ii(Rg)ii 
g 


2) i/^ = i 


This completes the proof that the irreducible characters … are otthonoimaL 

We still have to show that the number of irreducible characters is equal to the 
number of conjugacy classes, or ， equivalently, that the irreducible characters span 
the space % of class functions. Let the subspace they span be t Then [Chapter 7 
(2.15)] % — 龙 ㊉ 龙丄 .So we must show that 龙丄 = 0， or that a class function <f> 
which is orthogonal to every character is zero. 

Assume a class function <f> is given. So c/> is a complex-valued function on G 
which is constant on conjugacy classes. Let y be the character of a representation p, 
and consider the linear operator T: V - > V defined by 


(9.19) 


T 


N 


2 <l>(g) 
8 


Pg 
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Its trace is 

(9.20) trace T = ^(gj x(g)= < 少，义 > = 0 ， 

g 

because 沴 is orthogonal to x- 

(9.21) Lemma. The operator T defined by (9.19) is G-invariant. 

Proof. We have to show (9.2) ph° T = T ° ph，or T = ph~ l 。 7 1 。 p*，for ev- 
ery h S G. Let g ft = h~ l gh. Then as g runs over the group G，so does g "， and of 

course ph~ l Pgph = /v. Also <t>(g) = ♦ (g”) because <f> is a class function. Therefore 

Ph~ l T p h = I ^{g) Ph_ l PgPh = Hf) Pg tf = T , 

g f 

as required. □ 

Now if p is irreducible as well，then Schur’s Lemma (9.6b) applies and shows 
that T = cL Since trace 7 = 0 (9_20)，it follows that T = 0. Any representation p 
is a direct sum of irreducible representations, and (9.19) is compatible with direct 
sums. Therefore T = 0 in every case. 

We apply this to the case that p = p reg is the regular representation. The vec¬ 
tor space is V(G). We compute 7(1), where 1 denotes the identity element of G. By 
definition of the regular representation, p g (l) = So 

(9.22) 。 = T ⑴ = 士 2 <t>(g)PgW = 7；S 4>(g)g - 

g g 

Since the elements of G are a basis for V = V (G), this shows that <f>(g) = 0 for all 
g y hence that </> = 0. □ 

10. REPRESENTATIONS OF THE GROUP SU 2 

Much of what was done in Sections 6 to 9 carries over without change to continuous 
representations of compact groups G，once a translation-invariant (Haar) measure dg 
has been found. One just replaces summation by an integral over the group. How¬ 
ever, there will be infinitely many irreducible representations if G is not finite. 

When we speak of a representation p of a compact group，we shall always 
mean a continuous homomorphism to GL(V), where V is a finite-dimensional com¬ 
plex vector space. The character ^ of p is then a continuous, complex-valued func¬ 
tion on G ? which is constant on each conjugacy class. (It is a class function.) 

For example，the identity map is a two-dimensional representation of SU 2 . Its 
character is the usual trace of 2 x 2 matrices. We will call this the standard repre¬ 
sentation of Slh. The conjugacy classes in SUi are the sets of matrices with given 
trace 2c. They correspond to the latitudes {x\ = c} in the 3-sphere SUi [Chapter 8 
(2.8)]. Because of this，a class function on SU 2 depends only on X\. So such a func- 
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tion can be thought of as a continuous function on the interval [—1 ， 1]. In the nota¬ 
tion of Chapter 8 (2.5)，the character of the standard representation of SU 2 is 

x(p) — trace P = a a = 2x \. 

Let I G I denote the volume of our compact group G with respect to the mea¬ 
sure dg: 

(10.1) |G| = J 1 dg. 

Then the hermitian form which replaces (5.8) is 

( 10 . 2 ) (x^x') ^ [^1 { ⑴办 . 

With this definition, the orthogonality relations carry over. The proofs of the fol¬ 
lowing extensions to compact groups are the same as for finite groups: 

(10.3) Theorem. 

(a) Every finite-dimensional representation of a compact group G is a direct sum 
of irreducible representations. 

(b) Schur's Lemma: Let p,p f be irreducible representations, and let T: V - > V f 

be a G -invariant linear transformation. Then either T is an isomorphism, or 
else T = 0. If p = p then T is multiplication by a scalar. 

(c) The characters of the irreducible representations are orthogonal with respect to 
the form (10.2). 

(d) If the characters of two representations are equal，then the representations are 
isomorphic, 

(e) A character x has the property 〈 n〉= 1 if and only if p is irreducible, 

(f) If G is abelian, then every irreducible representation is one-dimensional. □ 

However, the other parts of Theorem (5,9) do not carry over directly. The 
most significant change in the theory is in Section 6. If G is connected, it cannot 
operate continuously and nontrivially on a finite set, so finite - dimensional represen¬ 
tations can not be obtained from actions on sets. In particular, the regular represen¬ 
tation is not finite - dimensional. Analytic methods are needed to extend that part of 
the theory. 

Since a Haar measure is easy to find for the groups U\ and SU l7 we may con¬ 
sider all of (10.3) proved for them. 

Representations of the circle group U\ are easy to describe，but they are funda¬ 
mental for an understanding of arbitrary compact groups. It will be convenient to 
use additive and multiplicative notations interchangeably; 

(10.4) 50 2 (1R)—^ C/i 

(rotation by 6)^^e w = a. 
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(10.5) Theorem. The irreducible representations of U\ are the nth power maps: 

“/T 

C/i — >U U 

sending or There is one such representation for every 

integer n. 

Proof. By (10.3f)，the irreducible representations are all one-dimensional, 
and by (3,5)，they are conjugate to unitary representations. Since GL X = C x is abe¬ 
lian, conjugation is trivial, so a one-dimensional matrix representation is automati¬ 
cally unitary. Hence an irreducible representation of CA is a continuous homomor¬ 
phism from U\ to itself. We have to show that the only such homomorphisms are the 
nth power maps. 

(10.6) Lemma. The continuous homomorphisms if/: 盹 +- are multiplica¬ 

tion by a scalar: ip(x) = cx, for some c G [R. 

Proof. Let if/: 1R + - > [R + be a continuous homomorphism. We will show that 

il/(x) = for all jc. This will show that ij/ is multiplication by c = t //⑴ • 

Since 少 is a homomorphism ， if/(nr) = <//(，+ ••• + r) = raj/(r ), for any real 
number r and any nonnegative integer n. In particular ， ij/(n) = mj ； (V)• Also, 

Therefore for every integer n. Next we 

let r = m/n be a rational number. The nif/(r) = if/(nr) = Dividing 

by n, we find if/(r) = for every rational number r. Since the rationals are 

dense in R and if/ is continuous ， if/(x) = cx for all x. □ 

(10.7) Lemma. The continuous homomorphisms cp: IR + - > U\ are of the form 

(p(x) = e icx for some c E U. 

Proof. If (p is differentiable, this can be proved using the exponential map 
of Section 5, Chapter 8. We prove it now for any continuous homomorphism. We 

consider the exponential homomorphism e: [R + - > U\ defined by e(x) = This 

homomorphism wraps the real line around the unit circle with period 2tt [see Fig¬ 
ure (10.8)]. For any continuous function ip: - > U\ such that <p(0) = 1， there is 

a unique continuous lifting ij/ of this function to the real line such that ij/(0) = 0. In 

other words，we can find a unique continuous function if/: U - > U such that 

i/f(0) = 0 and ip(x) = e(if/(x)) for all jc. The lifting is constructed starting with the 
definition 少 (0) = 0 and then extending ij/ a small interval at a time. 

We claim that if is a homomorphism，then its lifting ip is also a homomor¬ 
phism. If this is shown，then we will conclude that — cx for some c by (10.6), 
hence that <p(x) = e icx , as required 

The relation <p(x+y) = <p(x)(p(y) implies that e ( 少 (x+y)- 少 (x)- 少 (y)) = 1. 
Hence \p(x y) — ip(x)if/(y) = 7/nm for some integer m which depends continu¬ 
ously on x and y. Varying continuously, m must be constant，and setting x = y = 0 
shows that m = 0. So 少 is a homomorphism, as claimed. □ 

Now to complete the proof of Theorem (10.5), let p; U\ - > U\ be a continu¬ 
ous homomorphism. Then <p = p ° e: [R + - > U\ is also a continuous homomor- 



Section 10 Representations of the Group SU 2 


333 




0 



(10,8) Figure. 

phism，so ip(x) = e lcx by (10,7)* Moreover ， ip(l7r) = p(l), which is the case if and 
only if c is an integer, say n. Then p (e lx ) = e tnx = (e ix ) n . □ 


Now let us examine the representations of the group SU 2 . Again, there is an 
infinite family of irreducible representations which arise naturally，and they turn out 
to form a complete list. Let V n be the set of homogeneous polynomials of degree n in 
variables u,v. Such a polynomial will have the form 

(10.9) /(w, v) = Xou n +x 1 unv+ … +x"v n ， 

where the coefficients xi are complex numbers. Obviously, V n is a vector space of di¬ 
mension + 1， with basis (u n ,u n ~~ l v The group G = GL 2 operates on V n 
in the following way: Let P G GL 2 , say 

a b 

p = 

Lc d\ 

Let P act on the basis (u ? v) of V\ as usual: 

(u f ,v f ) — (u, v)P — (au + cv，bu + dv); 
define pn p by the rule 

(10.10) u l v jM ^^u u v f j and 

f(u 9 v)^^XoU fn + + … + X n V fn . 

This is a representation 

( 10 . 11 ) Pn ： G ― >GL(Vn) - GL n+l . 

The trivial representation is p 0? and the standard representation is p\. 

For example, the matrix of p 2p is 

a 2 ab b 2 

lac ad + be 2bd . 

c 2 cd d 2 — 

Its first column is the coordinate vector of pi P (u 2 ) = (au + cv) 2 = a 2 u 2 + 
lacuv + c 2 v 2 , and so on. 


( 10 . 12 ) R2p = 
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(10.13) Theorem. The representations p n {n = 0,1 ， 2，..,) obtained by restricting 
(10,11) to the subgroup SU 2 are the irreducible representations of SU 2 . 

Proof • We consider the subgroup T of SU 2 of diagonal matrices 

(10.14) 


a 

a 


where a = e ie . This group is isomorphic to U x . The conjugacy class of an arbitrary 
unitary matrix P contains two diagonal matrices, namely 

A " 

入， 


入 


A 


and 


where 入，入 are the eigenvalues of P [Chapter 7 (7.4)]. They coincide only when 
入 =±1. So every conjugacy class except {/} and {-/} intersects 7 in a pair of ma¬ 
trices. 


(10.15) Proposition. 

(a) A class function on SU 2 is determined by its restriction to the subgroup T. 

(b) The restriction of a class function 屮 to T is an even function, which means that 

(p(a) = <p(a) or (p{6) = (p(~6). □ 

Next，any representation p of SUi restricts to a representation on the subgroup T, 
and T is isomorphic to U\. The restriction to T of an irreducible representation of 
SU 2 will usually be reducible, but it can be decomposed into a direct sum of irre¬ 
ducible representations of T. Therefore the restriction of the character ^ to T gives 
us a sum of irreducible characters on V \. Theorem (10*5) tells us what the irre¬ 
ducible characters of T are: They are the nth powers e ine , n E Z. Therefore we find: 

(10.16) Proposition. The restriction to T of a character x on SU 2 is a finite sum of 
exponential functions e in0 . □ 

Let us calculate the restriction to T of the character Xn of p n (10.11). The ma¬ 
trix (10.14) acts on monomials by 

Therefore, its matrix，acting on the basis (u n , u n ~ l v y ...,v n ) y is the diagonal matrix 
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and the value of the character is 

(10.17) Xn{<^) ~ a n +a n ~ 2 + m --\-a~ n = … + ’ 灿， 

or 

(10.18) ATo= 1 

X\ = 2 cos 6 = e 16 + e~ ie 

Xi — 1+2 cos W = e 2ie +l+e~ 2ie 

^3 = 2 cos 30 + 2 cos 6 


Now let be any irreducible character on SU 2 . Its restriction to T is even 
(Id 15b) and is a sum of exponentials e ine (10.16). To be even, e ine and e— lne must 
occur with the same coefficient, so the character is a linear combination of the func¬ 
tions cos nO = \(e in0 +e~ ine ). The functions (10.17) form a basis for the vector 
space spanned by {cos nB}. Therefore 

(10.19) x' r iXi 

i 

where r/ are rational numbers. A priori, this is true on T, but by (10,15a) it is also 
true on all of SUi. Clearing denominators and bringing negative terms to the left in 

(10.19) yields a relation of the form 

( 10 . 20 ) ^ + S rijXj = E mxk ， 

j k 

where nj ? rik are positive integers and the index sets {y}, {k} are disjoint. This relation 
implies 


师 , ㊉ 2 njpj = X nkpk. 

j k 

Therefore p ; is one of the representations pi This completes the proof of Theorem 
(10.13). □ 


We leave the obvious generalizations to the reader . 

Israel Herstein 


EXERCISES 

h DeGnition of a Group Representation 

1. Let p be a representation of a group G. Show that det p is a one-dimensional representa¬ 
tion. 
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2. Suppose that G is a group with a faithful representation by diagonal matrices. Prove that 
G is abelian. 

3. Prove that the rule S n - > [R x defined by p sign p is a one-dimensional representa¬ 

tion of the symmetric group. 

4. Prove that the only one-dimensional representations of the symmetric group S 5 are the 
trivial representation defined by p(g) = I for all g and the sign representation. 

5 - (a) Write the standard representation of the octahedral group O by rotations explicitly ， 
choosing a suitable basis for U 3 . 

(b) Do the same for the dihedral group D n . 

*(c) Do the same for the icosahedral group L 

， a = is a representation of S0 2 , when 
a rotation in SO 2 is represented by its angle, 

7. Let i/ be a subgroup of index 2 of a group G, and let p: G - > GL (V) be a representa¬ 
tion. Define p f \ G - >GL(V) by the rule p r {g) = p(g) if g E H, and p f (g) - ~p(g) 

if g H. Prove that 〆 is a representation of G. 

8. Prove that every finite group G has a faithful representation on a finite - dimensional com¬ 
plex vector space , 

9. Let N be a normal subgroup of a group G. Relate representations of G/N to representa¬ 
tions of G. 

10. Choose three axes in R 3 passing through the vertices of a regular tetrahedron centered at 
the origin. (This is not an orthogonal coordinate system.) Find the coordinates of the 
fourth vertex, and write the matrix representation of the tetrahedral group T in this coor¬ 
dinate system explicitly. 

2* G-Invariant Forms and Unitary Representations 

1. (a) Verify that the form X^BY (2.10) is G-invariant. 

(b) Find an orthonormal basis for this form, and determine the matrix P of change of ba¬ 
sis- Verify that PAP~ X is unitary. 

2. Prove the real analogue of (2.2): Let R: G - >GL n (U) be a representation of a finite 

group G. There is a P E GL n (U) such that PR g P l is orthogonal for every g E G. 

3* Let p\ G - ^ SLi{U) be a faithful representation of a finite group by real 2x2 matrices 

of determinant 1. Prove that G is a cyclic group. 

4. Determine all finite groups which have a faithful real two-dimensional representation. 

5. Describe the finite groups G which admit faithful real three-dimensional representations 
with determinant 1. 

6. Let Vbe a hermitian vector space. Prove that the unitary operators on V form a subgroup 
U (VO of GL(V), and that a representation p on V has image in U(V) if and only if the 
form 〈，〉 is G-invariant. 

7. Let 〈，〉 be a nondegenerate skew-symmetric form on a vector space V, and let p be a rep¬ 
resentation of a finite group G on V, 

(a) Prove that the averaging process (2*7) produces a G - invariant skew-symmetric form 
on V. 

(b) Does this prove that every finite subgroup of GLm is conjugate to a subgroup of 

SP 2n l 


6. Show that the rule cr (6) 


a a 


2 


a 


0 


a 


2 
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8. (a) Let R be the standard two-dimensional representation of with the triangle situated 
so that the x-axis is a line of reflection. Rewrite this representation in terms of the 
basis x f = x and y f = x + y. 

(b) Use the averaging process to obtain a G -invariant form from dot product in the 
{x\y ^-coordinates. 

3. Compact Groups 


1. Prove that dx/x is a Haar measure on the multiplicative group U x , 


2. (a) Let P - 


PnP \2 

Pi\ P22 


be a variable 2x2 matrix, and let dV = dpudpndp 2 \dp 2 2 denote 


the ordinary volume form on [R 2X2 . Show that (det P)~ 2 dV is a Haar measure on 

GL 2 (U). 


(b) Generalize the results of (a)* 


v /j y /m Y 

*3* Show that the form —- '~~ - on the 3-sphere defines a Haar measure on Slh. What re- 

places this expression at points where = 0? 

4. Take the complex representation of S0 2 in U 2 given by 


0 -( 0 ) = 


a 

0 


a 


2 


a 


2 


a 


a 


e 


/e 


and reduce it to a unitary representation by averaging the hermitian product on U 2 . 


4. G-Invanant Subspaces and Irreducible Representations 

1. Prove that the standard three-dimensional representation of the tetrahedral group T is ir¬ 
reducible as a complex representation. 

2* Determine all irreducible representations of a cyclic group C n . 

3* Determine the representations of the icosahedral group I which are not faithful. 

4. Let p be a representation of a finite group G on a vector space V and let d E V. 

(a) Show that averaging over G gives a vector T; E V which is fixed by G. 

(b) What can you say about this vector if p is an irreducible representation? 

5. Let i/ C G be a subgroup, let p be a representation of G on V, and let t; E V. Let 

w = hv. What can you say about the order of the G-orbit of w? 

6. Consider the standard two-dimensional representation of the dihedral group D n as sym¬ 
metries of the n-gon. For which values of n is it irreducible as a complex representation? 

*7. Let G be the dihedral group Z) 3 , presented as in Chapter 5 (3.6). 

(a) Let p be an irreducible unitary representation of dimension 2. Show that there is an 

f 1 ' 

orthonormal basis of V such that R y = . 

(b) Assume that R y is as above. Use the defining relations yx = x 2 y, x 3 = 1 to deter¬ 
mine the possibilities for R x . 

(c) Prove that all irreducible two-dimensional representations of G are isomorphic. 

(d) Let p be any representation of G, and let t; E V be an eigenvector for the operator 
p x * Show that v is contained in a G-invariant subspace W of dimension < 2. 

(e) Determine all irreducible representations of G. 
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5 . Characters 


1* Corollary (5.11) describes a basis for the space of class functions. Give another basis. 

Find the decomposition of the standard two-dimensional rotation representation of the 
cyclic group C n by rotations into irreducible representations. 

3. Prove or disprove: Let ^ be a character of a finite group G，and define x(g) = ^(g). 
Then Y is also a character of G. 

4. Find the dimensions of the irreducible representations of the group O of rotations of a 
cube, the quaternion group, and the dihedral groups D 4 , D 5 , and D 6 . 

5. Describe how to produce a unitary matrix by adjusting the entries of a character table. 

6. Compare the character tables for the quaternion group and the dihedral group D 4 . 

7. Determine the character table for 

8. (a) Determine the character table for the groups C 5 and D 5 . 

(b) Decompose the restriction of each irreducible character of D 5 into irreducible charac¬ 
ters of C 5 * 

9. (a) Let p be a representation of dimension d, with character x* Prove that the kernel of p 

is the set of group elements such that x (g) — 

(b) Show that if G has a proper normal subgroup, then there is a representation p such 
that ker p is a proper subgroup. 

*10, Let ^ be the character of a representation p of dimension d. Prove that | 尤（兮 ）| ^ d for all 
g G G, and that if \x(g)\ ^ d, then p(g) 二 (/，for some root of unity 

11* Let G ^ = G/N be a quotient group of a finite group G，and let p / be an irreducible rep¬ 
resentation of G ’. Prove that the representation of G defined by p' is irreducible in two 
ways: directly, and using Theorem (5.9). 

12. Find the missing rows in the character table below: 


⑴ （ 3) ⑹ （ 6 ) ⑻ 

\ a b c d 


Xi 

X3 

X4 


1 1 - 1-1 

3-1 1-1 

3—1-1 1 


0 

0 


13. The table below is a partial character table of a finite group, in which ^ — |(-1 + V3/) 
and y = ^(-1 + V7i). The conjugacy classes are all there* 


, ⑴ 
灼 1 


X2 

X3 


(3) (3) 


y y 

y y 


⑺ 

7 

0 

0 


⑺ 


0 

0 


(a) Determine the order of the group and the number and the dimensions of the irre¬ 
ducible representations. 

(b) Determine the remaining characters, 

(c) Describe the group by generators and relations. 
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*14* Describe the commutator subgroup of a group G in terms of the character table. 
*15* Below is a partial character table. One conjugacy class is missing. 


⑴ （1) (2) (2) ⑶ 



1 

u 

V 

w 

/ 

X 

ATi 

1 

1 

1 

1 

1 

X2 

1 

1 

1 

1 

-1 


1 

一 1 

1 

一 1 

1 

l 

X4 

1 

-1 

1 

-1 

* 

—l 

Xs 

2 

-2 

-1 

-1 

0 


(a) Complete the table. 

(b) Show that u has order 2, jc has order 4, w has order 6, and v has order 3. Determine 
the orders of the elements in the missing conjugacy class. 

(c) Show that v generates a normal subgroup. 

(d) Describe the group, 

*16 - (a) Find the missing rows in the character table below. 

(b) Show that the group G with this character table has a subgroup H of order 10, and 
describe this subgroup as a union of conjugacy classes. 

(c) Decide whether H is Cio or D 5 . 

(d) Determine the commutator subgroup of G. 

(e) Determine all normal subgroups of G. 

(f) Determine the orders of the elements a, b, c, d. 

(g) Determine the number of Sylow 2-subgroups and the number of Sylow 5-subgroups 
of this group. 


*17. In the character table below, < 



(1) 

⑷ 

(5) 

⑸ 

⑸ 




1 

a 

b 

c 

d 



ATi 

1 

1 

1 

1 

l 



X2 

1 

1 

一 1 

—1 

l 



X3 

1 

1 

-i 

i 

-l 




1 

1 

i 

/ 

* 

一； 

-l 



, i : 

=i(~ 

1 + 

V3/). 






(1) 

⑹ 

⑺ 

(7) 

⑺ 

⑺ 

⑺ 


1 

a 

b 

c 

d 

e 

f 

义 i 

1 

1 

1 

1 

i 

1 

l 

Xi 

1 

1 

1 

c 

i 

i 

i 


1 

1 

1 

c 


l 


^4 

1 

1 

-1 

-c 


i 

c 


1 

1 

-1 

-l 

-c 

l 

c 


I 

1 

-1 

-l 

-1 

i 

1 

^7 

6 

-1 

0 

0 

0 

0 

0 


(a) Show that G has a normal subgroup isomorphic to D 7 , and determine the structure 
of G/N. 
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(b) Decompose the restrictions of each character to N into irreducible iV-characters. 

(c) Determine the numbers of Sylow /? -subgroups, for /? = 2, 3, and 7, 

(d) Determine the orders of the representative elements c, d, e ， f. 

6. Permutation Representations and the Regular 
Representation 

1. Verify the values of the characters (6.4) and (6.5), 

2. Use the orthogonality relations to decompose the character of the regular representation 
for the tetrahedral group. 

3* Show that the dimension of any irreducible representation of a group G of order n > 1 is 
at most N — 

4* Determine the character tables for the nonabelian groups of order 12, 

5. Decompose the regular representation of C 3 into irreducible real representations. 

6 . Prove Corollary ( 6 . 8 ). 

7. Let p be the permutation representation associated to the operation of D 3 on itself by 
conjugation. Decompose the character of p into irreducible characters. 

8 . Let 5 be a G-set, and let p be the permutation representation of G on the space V(S), 
Prove that the orbit decomposition of S induces a direct sum decomposition of p. 

9. Show that the standard representation of the ^mmetric group S n by permutation ma¬ 
trices is the sum of a trivial representation and an irreducible representation. 

*10. Let // be a subgroup of a finite group G. Given an irreducible representation p of G, we 
may decompose its restriction to H into irreducible //-representations. Show that every 
irreducible representation of H can be obtained in this way. 

Z The Representations of the Icosahedral Group 

1. Compute the characters of I, and use the orthogonality relations to determine 

the remaining character ^ 3 . 

2. Decompose the representations of the icosahedral group on the sets of faces, edges，and 
vertices into irreducible representations. 

3. The group S$ operates by conjugation on its subgroup A 5 , How does this action operate 
on the set of irreducible representations of a 5 ? 

*4. Derive an algorithm for checking that a group is simple by looking at its character table. 

5. Use the character table of the icosahedral group to prove that it is a simple group. 

6 . Let // be a subgroup of index 2 of a group G，and let a: H - ^ GL (V) be a representa¬ 

tion. Let a be an element of G not in H. Define a conjugate representation 
a f : H - > GL (V) by the rule <r f (h) — a(a~ l ha). 

(a) Prove that cr ; is a representation of //. 

(b) Prove that if a is the restriction to // of a representation of G，then cr f is isomorphic 
to a. 

(c) Prove that if b is another element of G not in H, then the representation cr rf (h )= 
a(b~ l hb) is isomorphic to a \ 

7. (a) Choose coordinates and write the standard three-dimensional matrix representation 

of the octahedral group O explicitly. 
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(b) Identify the five conjugacy classes in 0, and find the orders of its irreducible repre¬ 
sentations. 

(c) The group O operates on these sets: 

⑴ six faces of the cube 

(ii) three pairs of opposite faces 

(iii) eight vertices 

(iv) four pairs of opposite vertices 

(v) six pairs of opposite edges 

(vi) two inscribed tetrahedra 

Identify the irreducible representations of O as summands of these representations, 
and compute the character table for O. Verify the orthogonality relations. 

(d) Decompose each of the representations (c) into irreducible representations. 

(e) Use the character table to find all normal subgroups of O. 

8* (a) The icosahedral group / contains a subgroup 7, the stabilizer of one of the cubes 
[Chapter 6 (6.7)]. Decompose the restrictions to T of the irreducible characters of L 
(b) Do the same thing as (a) with a subgroup D 5 of /. 

9. Here is the character table for the group G = 尸见 2 ( 『 7 )， with y — 士 (—1 + V7/], y f — 

\(-l - Vli). 


(1) (21) (24) (24) (42) (56) 



1 

* / 

a 

b 

c 

d 

e 

A 

1 

1 

1 

1 

l 

1 

义 2 

3 

—1 

y 

i 

l 

0 


3 

-1 

i 

y 

l 

0 

X4 

6 

2 

-l 

—l 

0 

0 

Xs 

7 

-1 

0 

0 

-1 

1 

X6 

i 

8 

0 

l 

l 

0 

-1 


(a) Use it to give two different proofs that this group is simple. 

(b) Identify，so far as possible，the conjugacy classes of the elements 


"i r 


"2 _ 

_ i_ 


. 4_ 


and find matrices which represent the remaining conjugacy classes. 

(c) G operates on the set of one-dimensional subspaces of F 2 (F = F 7 )* Decompose the 
associated character into irreducible characters. 

& One-dimensional Representations 

1. Prove that the abelian characters of a group G form a group. 

2. Determine the character group for the Klein four group and for the quaternion group. 

3. Let A,B be matrices such that some power of each matrix is the identity and such that A 
and B commute. Prove that there is an invertible matrix P such that PAP' { and PBP~ l are 
both diagonal. 

4 . Let G be a finite abelian group. Show that the order of the character group is equal to the 
order of G. 
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*5. Prove that the sign representation p aaaa^ s ig n p and the trivial representation are the only 
one-dimensional representations of the ^mmetric group S n .. 

6 . Let G be a cyclic group of order n, generated by an element x, and let ( = e l7Tl ’\ 

(a) Prove that the irreducible representations are pQ ， "”p n -u where p k \ G /wvvv ^C x is 
defined by pk{x) = ( k . 

(b) Identify the character group of G. 

(c) Verify the orthogonality relations for G explicitly. 

7. (a) Let <p\ G - >G r be a homomorphism of abelian groups. Define an induced homo¬ 

morphism <p: G f < - G between their character groups. 

(b) Prove that <p is surjective if <p is injective，and conversely. 


9* Schur’s LemmSf and Proof of the Orthogonality Relations 


1. Let p be a representation of G, Prove or disprove: If the only G-invariant operators on V 
are multiplication by a scalar，then p is irreducible. 

2. Let p be the standard three-dimensional representation of T, and let p r be the permuta¬ 
tion representation obtained from the action of T on the four vertices. Prove by averag¬ 
ing that p is a summand of 〆 . 

3. Let p = p' be the two-dimensional representation (4.6) of the dihedral group D 3 , and let 

A = ^ 1 . Use the averaging process to produce a G-invariant transformation from 

left multiplication by A . 



1 1-1 


- 卜 1 

4. (a) Show that R x — 

1 

1 -1 

， Ry = 

-1 1 
一 1 


defines a representation of Z) 3 . 


(b) We may regard the representation p 2 of (5.15) as a 1 X 1 matrix representation. Let 

T be the linear transformation C 1 - >C 3 whose matrix is (1 ， 0, 0) x . Use the averag¬ 

ing method to produce a G-invariant linear transformation from 7, using p 2 and the 
representation R defined in (a). 

(c) Do part (b)，replacing p 2 by pi and p 3 . 

(d) Decompose R explicitly into irreducible representations. 


10. Representations of the Group SU 2 

1« Determine the irreducible representations of the rotation group SO 3 . 

2. Determine the irreducible representations of the orthogonal group 0 2 . 

3. Prove that the orthogonal representation SU 2 —— >S0 3 is irreducible, and identify its 
character in the list (10.18). 

4. Prove that the functions (10.18) form a basis for the vector space spanned by {cos nd}. 

5. Left multiplication defines a representation of SU 2 on the space U 4 with coordinates 
A ， … ，； c 4 , as in Chapter 8, Section 2. Decompose the associated complex representation 
into irreducible representations. 

6 . (a) Calculate the four-dimensional volume of the 4-ball of radius r ， B 4 = 

{xi 2 +X 2 2 +X 3 2 +x 4 2 < r 2 } ? by slicing with three-dimensional slices. 

(b) Calculate the three-dimensional volume of the 3-sphere S 3 , again by slicing. It is 
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advisable to review the analogous computation of the area of a 2-sphere first. You 
should find 石 （volume of fi 4 ) = (volume of S 3 ). If not, try again, 

*7. Prove the orthogonality relations for the irreducible characters (10.17) of SU 2 by integra¬ 
tion over 5 3 . 


Miscellaneous Problems 

*1. Prove that a finite simple group which is not of prime order has no nontrivial representa¬ 
tion of dimension 2. 


*2. Let // be a subgroup of index 2 of a finite group G, and let a be an element of G not in 
H y so that aH is the second coset of H in G. Let S\ H - > GL n be a matrix representa¬ 
tion of H t Define a representation ind S: G - > GL 2n of G, called the induced represen- 

tation, as follows: 


(ind S)h — 


Sh 



S a ~ l ha 


(ind S) ah = 



Saha 

Sh 



(a) Prove that ind 5 is a representation of G. 

(b) Describe the character xinds of ind S in terms of the character xs of S. 

(c) If R: G - > GL m is a representation of G, we may restrict it to H. We denote the re¬ 
striction by res R ： H - >GL n . Prove that res (ind S ) 七 ^㊉^’， where S f is the con¬ 

jugate representation defined by Sh = S a ~^ha^ 

(d) Prove Frobenius reciprocity: {Xmds^X^) = 〈私， A^es/?〉. 

(e) Use Frobenius reciprocity to prove that if S and S' are not isomorphic representa¬ 
tions, then the induced representation ind 5 of G is irreducible. On the other hand, if 
S ^ S’ ， then ind S is a sum of two irreducible representations /?，/?’. 

*3. Let // be a subgroup of index 2 of a group G，and let /? be a matrix representation of G. 

Let R f denote the conjugate representation, defined by R g f = R g if g E //, and R g f = 

-R g otherwise. 

(a) Show that R f is isomorphic to R if and only if the character of R is identically zero on 
the coset gH ，where g l H. 

(b) Use Frobenius reciprocity to show that ind (res R) — 

(c) Show that if R is not isomorphic to R f ， then res R is irreducible, and if these two rep¬ 
resentations are isomorphic, then res /? is a sum of two irreducible representations 
of H. 

*4. Using Frobenius reciprocity, derive the character table of S n from that of A n when 

(a) n — 3, (b) " = 4 ， (c) n = 5. 

*5. Determine the characters of the dihedral group D n ， using representations induced from 

C n . 

6 •⑻ Prove that the only element of SU 2 of order 2 is -/. 

(b) Consider the homomorphism <p\ SU 2 - Let A be an element of SU 2 such that 

(p(A) = A has finite order n in Prove that the order n of A is either n or 2n. Also 
prove that if n is even, then n = 2 n t 

*7. Let G be a finite subgroup of SU 2 , and let G — <p(G), where <p: SU 2 - > SO3 is the or¬ 

thogonal representation (Chapter 8, Section 3). Prove the following. 

(a) If IGI is even, then | G | - 21G | and G = 1 ⑹. 
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(b) Either G — 一 1 ⑹， or else G is a cyclic group of odd order. 

(c) Let G be a cyclic subgroup of SU 2 of order n . Prove that G is conjugate to the group 


generated by 


{ , where ^ = e 27Tljfn , 


(d) Show that if G is the group D 2j then G is the quaternion group. Determine the ma¬ 
trix representation of the quaternion group // as a subgroup of Slh with respect to a 
suitable orthonormal basis in C 2 . 

(e) If G = 7, prove that G is a group of order 24 which is not isomorphic to the sym¬ 
metric group 5V 

*8. Let p be an irreducible representation of a finite group G. How unique is the positive 
definite G-invariant hermitian form? 


*9. Let G be a finite subgroup of GL n (C)• Prove that if tr g = 0, then g = 0. 

*10. Let p: G - > GL (V) be a two-dimensional representation of a finite group G, and as¬ 

sume that 1 is an eigenvalue of p g for every g G G. Prove that p is a sum of two one-di¬ 
mensional representations. 

*11 • Let p: G - > GL n (C) be an irreducible representation of a finite group G. Given any rep¬ 
resentation cr: GL n - >GL{V) of GL n , we can consider the composition a 0 p as a rep¬ 

resentation of G. 

(a) Determine the character of the representation obtained in this way when a is left 
multiplication of GL n on the space C nXn of n X n matrices. Decompose cr 0 p into ir¬ 
reducible representations in this case. 

(b) Find the character of a 0 p when a is the operation of conjugation on M n (C). 
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Bitte vergifi alles，was Du auf der Schule gelernt hast; 

denn Du hast es nicht gelernt, 

Edmund Landau 


1. DEFINITION OF A RING 

The integers form our basic model for the concept of a ring. They are closed under 
addition ， subtraction, and multiplication, but not under division. 

Before going to the abstract definition of a ring, we can get some examples by 
considering subrings of the complex numbers. A subring of C is a subset which is 
closed under addition, subtraction, and multiplication and which contains 1. Thus 
any subfield [Chapter 3 (2.1)] is a subring. Another example is the ring of Gauss in¬ 
tegers, which are complex numbers of the form a + bi, where a and b are integers• 
This ring is denoted by 

(1.1) Z[/] = {a bi\a,b E Z}. 

The Gauss integers are the points of a square lattice in the complex plane. 

We can form a subring Z[a] analogous to the ring of Gauss integers，starting 
with any complex number a. We define Z[a] to be the smallest subring of C con¬ 
taining a ? and we call it the subring generated by a. It is not hard to describe this 
ring. If a ring contains a, then it contains all positive powers of a because it is 
closed under multiplication. Also, it contains sums and differences of such powers, 
and it contains 1. Therefore it contains every complex number p which can be ex¬ 
pressed as a polynomial in a with integer coefficients: 

(1.2) P = a n a n + + aia + a 0 , where a, E Z, 

On the other hand，the set of all such numbers is closed under the operations of addi¬ 
tion, subtraction, and multiplication, and it contains 1. So it is the subring generated 
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by a. But Z[a] will not be represented as a lattice in the complex plane in most 
cases. For example, the ring Z[^] consists of the rational numbers which can be ex¬ 
pressed as a polynomial in \ with integer coefficients. These rational numbers can be 
described simply as those whose denominator is a power of 2. They form a dense 
subset of the real line, 

A complex number a is called algebraic if it is a root of a polynomial with in¬ 
teger coefficients, that is，if some expression of the form (1.2) is zero. For example, 

i + 3, 1 /1,1 + ^ 2 , and V3 + V~5 are algebraic numbers. 

If there is no polynomial with integer coefficients having a as a root, then a is 
called a transcendental number- The numbers e and tt are transcendental, though it 
is not easy to prove that they are. If a is transcendental, then two distinct polyno¬ 
mial expressions (1.2) must represent different complex numbers. In this case the el¬ 
ements of the ring Z[a] correspond bijectively to polynomials p(x) with integer 
coefficients, by the rule p(x) <~~>/?(«). 

When a is algebraic there will be many polynomial expressions (1.2) which 
represent the same complex number. For example, when a = i, the powers a n take 
the four values ±1 ， 土 L Using the relation i 2 = -1, every expression (L2) can be 
reduced to one whose degree in / is <1. This agrees with the description given 
above for the ring of Gauss integers. 

The two kinds of numbers，algebraic and transcendental, are somewhat 
analogous to the two possibilities, finite and infinite，for a cyclic group [Chapter 2 
(2^7)]. 

The definition of abstract ring is similar to that of field [Chapter 3 (2,3)] ， ex¬ 
cept that multiplicative inverses are not required to exist: 

(1.3) Definition. A ring /? is a set with two laws of composition + and x ， called 
addition and multiplication, which satisfy these axioms: 

(a) With the law of composition +， /? is an abelian group, with identity denoted 

by 0. This abelian group is denoted by R+. 

(b) Multiplication is associative and has an identity denoted by 1. 

(c) Distributive laws: For all a, b, c, E R, 

(a + b)c = ac + be and c(a + fe) = ca + cb. 

A subring of a ring is a subset which is closed under the operations of addition, sub¬ 
traction, and multiplication and which contains the element L 

The terminology used is not completely standardized. Some people do not re¬ 
quire the existence of a multiplicative identity in a ring. We will study commutative 
rings in most of this book，that is, rings satisfying the commutative law ab = ba for 
multiplication. So let us agree that the word ring will mean commutative ring with 
identity ， unless we explicitly mention noncommutativity . The two distributive laws 
(c) are equivalent for commutative rings. 

The ring IR nXn of all nXn matrices with real entries is an important example 
of a ring which is not commutative. 
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Besides subrings of C, the most important rings are polynomial rings. Given 
any ring R, a polynomial in x with coefficients in R is an expression of the form 

(1.4) a n x n + *•* + a\X + ao? 

with a t E R. The set of these polynomials forms a ring which is usually denoted by 
R[x], We will discuss polynomial rings in the next section. 

Here are some more examples of rings: 

(1.5) Examples* 

(a) Any field is a ring. 

(b) The set 说 of continuous real-valued functions of a real variable x forms a ring, 
with addition and multiplication of functions; 

[/ + g]M = f{x) + g(x) and [/g](x) = f(x)g(x). 

(c) The zero ring R = {◦} consists of a single element 0. 

In the definition of a field [Chapter 3 (2.3)]，the multiplicative identity 1 is re¬ 
quired to lie in F x — F — {0}. Hence a field has at least two distinct elements, 
namely 1 and 0. The relation 1 = 0 has not been ruled out in a ring, but it occurs 
only once: 

(1.6) Proposition. Let /? be a ring in which 1 = 0, Then R is the zero ring. 

Proof. We first note that 0a = 0 for any element a of a ring R. The proof is 
the same as for vector spaces [Chapter 3 (1.6a)]. Assume that 1 = 0 in /?， and let a 
be any element of R. Then a = la = 0a = 0. So every element of R is 0, which 
means that R is the zero ring. □ 

Though multiplicative inverses are not required to exist in a ring, a particular 
element may have an inverse，and the inverse is unique if it exists. Elements which 
have multiplicative inverses are called units• For example, the units in the ring of in¬ 
tegers are 1 and -1，and the units in the ring U[x] of real polynomials are the 
nonzero constant polynomials. Fields are rings which are not the zero ring and in 
which every nonzero element is a unit. 

The identity element 1 of a ring is always a unit, and any reference to “the” 
unit element in R refers to the identity. This is ambiguous terminology, but it is too 
late to change it. 

2. FORMAL CONSTRUCTION OF INTEGERS A1\D POLYNOMIALS 

We learn that the ring axioms hold for the integers in elementary school. However ， 
let us look again in order to see what is required in order to write down proofs of 
properties such as the associative and distributive laws. Complete proofs require a 
fair amount of writing，and we will only make a start here. It is customary to begin 
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by defining addition and multiplication for positive integers. Negative numbers are 
introduced later. This means that several cases have to be treated as one goes along, 
which is boring，or else a clever notation has to be found to avoid such a case analy¬ 
sis. We will content ourselves with a description of the operations on positive in¬ 
tegers .Positive integers are also called natural numbers • 

The set of natural numbers is characterized by these properties，called 
Pearto's axioms: 

( 2 . 1 ) 

(a) The set contains a particular element L 

(b) Successor function: There is a map a: N - > N that sends every integer 

n E ： N to another integer, called the next integer or successor . This map is in¬ 
jective, and for every « E o-(n) ^ 1. 

(c) Induction axiom: Suppose that a subset S of N has these properties; 

(i) 1 E 5; 

(ii) if n E ： S then cr {n) E S. 

Then S contains every natural number: 5 = 

The next integer cr(n) will turn into « + 1 when addition is defined. At this stage 
the notation n + l could be confusing. It is better to use a neutral notation，and we 
will often denote the successor by n f [= cr{n)\ Note that cr is assumed to be injec¬ 
tive, so if m, n are distinct natural numbers, that is, if m 羊 n, then m\n f are dis¬ 
tinct too. 

The successor function allows us to use the natural numbers for counting, 
which is the basis of arithmetic. 

Property (c) is the induction property of the integers. Intuitively, it says that 
the natural numbers are obtained from 1 by repeatedly taking the next integer: 
N = {1 ， 1’ ， 1"，."} (= {1 ， 2, 3，.^})，that is, counting runs through all natural num¬ 
bers. This property is the formal basis of induction proofs. 

Suppose that a statement P n is to be proved for every positive integer «， and let 
S be the set of integers n such that P n is true. To say that P n is true for every n is the 
same as saying that For this set S, the Induction Axiom translates into the 

usual induction steps: 

(2.2) (i) Pi is true; 

(ii) if P n is true then P n f is true. 

We can also use Peano’s axioms to make recursive definitions. The phrase re¬ 
cursive definition, or inductive definition ， refers to the definition of a sequence of ob¬ 
jects C n indexed by the natural numbers in which each object is defined in terms of 
the preceding one. The function C n = x n is an example. A recursive definition of 
this function is 

x l = x and x n， — x n x. 
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The important points are as follows: 

(2.3) (i) Ci is defined; 

(ii) a rule is given for determining CV (= C«+i) from C n . 

It is intuitively clear that (2.3) determines the sequence C n uniquely, though to 
prove this from Peano’s axioms is tricky. A natural approach to proving it would be 
as follows: Let S be the set of integers n such that (2.3) determines Ck for every 
k < n. Then (2_3i) shows that 1 G S. Also ，（ 2_3ii) shows that if n E ： S then 
n ; E 5. The Induction Axiom shows that 5 hence that C n is uniquely defined 
for each n. Unfortunately, the relation s is not included in Peano’s axioms, so it 
must be defined and its properties derived to start. A proof based on this approach is 
therefore lengthy, so we won't carry one out here* 

Given the set of positive integers and the ability to make recursive definitions ， 
we can define addition and multiplication of positive integers as follows: 

(2.4) Addition: m + l = m f and m + n f = (m + n)\ 

Multiplication: m • 1 = w and m ， n ! = m . « + m. 

In these definitions，we take an arbitrary integer m and then define addition and 
multiplication for that integer m and for every n recursively. In this way, m + n and 
m ♦ n are defined for all m and n. 

The proofs of the associative, commutative，and distributive laws for the in ， 
tegers are exercises in induction which might be called £t Peano playing.” We will 
carry out two of the verifications here as samples, 

Proof of the associative law for addition • We are to prove that {a + b) + n = 
a + (/? + n) for all a, b, n E. We first check the case « = 1 for all a, b. Three 
applications of definition (2.4) give 

(a + b) \ = (a b) f = a b f = a + (b + 

Next，assume the associative law true for a particular value of n and for all a, b. 
Then we verify it for n f as follows: 

(a + b) + n’ = (a + 办 ）+ (« + 1) (definition) 

=((a + /?) + «) + 1 (case n = l) 

=(a + (b n)) + l (induction hypothesis) 

=a + ((/? + n) + 1) (case n = \) 

—a + (/? + (« + 1)) (case n = l) 

=a + (b + n’) (definition). □ 

Proof of the commutative law for multiplication，assuming that the commutative 
law for addition has been proved. We first prove the following lemma: 

(2.5) m f ^ n = m * n + n. 
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The case n = 1 is clear: m ; - 1 = m r = m + 1 = m • 1 + 1. So assume that (2,5) 
is true for a particular n and for all values of m. We check it for n f : 

m f ， n f = m f • n + = m f ， n + (m+ 1) (definition) 

二 (m • n + n) + (m + 1) (induction) 

=(m • « + m) + + 1) (various laws for addition) 

=m • n f + ji/ (definition), 

Next, we check that l *n = nby induction on n. Finally，we show that m • n = 
n * m by induction on n, knowing that m • 1 = m = 1 • m: Assume it true for n. 
Then m • n f = — n^m + m = n f - as required. □ 

The proofs of other properties of addition and multiplication follow similar lines ， 

We now turn to the definition of polynomial rings. We can define the notion of 
a polynomial with coefficients in any ring R to mean a linear combination of powers 
of the variable: 

(2.6) f(x) = a n x n + a n -\x n ~ x + + a\x + ao, 

where ai E R. Such expressions are often called formal polynomials, to distinguish 
them from polynomial functions. Every formal polynomial with real coefficients de¬ 
termines a polynomial function on the real numbers. 

The variable x appearing in (2.6) is an arbitrary symbol，and the monomials x 1 
are considered independent. This means that if 

g(x) = b m x m + b m -\X m ~ x + ■•- + bix + b 0 

is another polynomial with coefficients in /? ， then/(jc) and g(x) are equal if and only 
if at = bi for all i = 0,1 ， 2, •… 

The degree of a nonzero polynomial is the largest integer k such that the 
coefficient ah of x k is not zero. (The degree of the zero polynomial is considered in¬ 
determinate.) The coefficient of highest degree of a polynomial which is not zero is 
called its leading coefficient ， and a monic polynomial is one whose leading 
coefficient is 1, 

The possibility that some of the coefficients of a polynomial may be zero cre¬ 
ates a nuisance. We have to disregard terms with zero coefficient: jc 2 + 3 = 
Ojc 3 + jc 2 + 3, for example. So the polynomial /(jc) has more than one representa¬ 
tion (2.6). One way to standardize notation is to list the nonzero coefficients only, 
that is，to omit from (2.6) all terms Ox 1 . But zero coefficients may be produced in the 
course of computations，and they will have to be thrown out. Another possibility is 
to insist that the highest degree coefficient a n of (2.6) be nonzero and to list all those 
of lower degree. The same problem arises. Such conventions therefore require a dis¬ 
cussion of special cases in the description of the ring structure. This is irritating, be¬ 
cause the ambiguity caused by zero coefficients is not an interesting point. 

One way around the notational problem is to list the coefficients of all mono¬ 
mials, zero or not. This isn’t good for computation, but it allows efficient 
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verification of the ring axioms. So for the purpose of defining the ring operations, 
we will write a polynomial in the standard form 

(2,7) f(x) = « 0 + fliJC + aix 2 + …， 

where the coefficients ai are all in the ring R and only finitely many of the 
coefficients are different from zero. Formally, the polynomial (2.7) is determined by 
its vector (or sequence) of coefficients at ： 

(2-8) a = 

where at E R and all but a finite number of at are zero. Every such vector corre¬ 
sponds to a polynomial. In case /? is a field，these infinite vectors form the vector 

space Z with the infinite basis o which was defined in Chapter 3 (5,2d). The vector 

* 

et corresponds to the monomial x\ and the monomials form a basis of the space of 
all polynomials. 

Addition and multiplication of polynomials mimic the familiar operations on 
real polynomial functions. Let/(x) be as above, and let 

(2*9) g(x) = bo + b\X + b 2 x 2 + … 

be another polynomial with coefficients in the same ring R, determined by the vec¬ 
tor b = (bo，lh，•••)• The sum off and g is 

(2.10) f(x) + g(x) = (a 0 + bo) + (a\ + b\)x + (a 2 + b 2 )x 2 + ••• 

=S (办 + b k )x k , 
k 

which corresponds to vector addition: a + b = (ao + bo,a x + 

The product of two polynomials/, g is computed by multiplying term by term 
and collecting coefficients of the same degree in x. If we expand the product using 
the distributive law，but without collecting terms，we obtain 


(2.11) f{x)g(x) = S aibjX i+i . 

ij 

Note that there are finitely many nonzero coefficients cnb[ This is a correct formula ， 
but the right side is not in the standard form (2.7) because the same monomial x n 
appears many times — once for each pair i,j of indices such that i + j = n. So 
terms have to be collected to put the right side back into standard form. This leads to 
the definition 


where 


f(x)g(x) = p Q 七 pix + p 2 x 2 + 


( 2 , 12 ) 


Pk = a 0 bk + aibk-i + … + akb 0 = ^ aibj. 

i+j=k 


However, it may be desirable to defer the collection of terms for a while when mak¬ 
ing computations. 
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(2.13) Proposition. There is a unique commutative ring structure on the set of 
polynomials R[x] having these properties: 

(a) Addition of polynomials is vector addition (2.10). 

(b) Multiplication of monomials is given by the rule (2,12). 

(c) The ring R is a subring of R[x], when the elements of R are identified with the 
constant polynomials. 

The proof of this proposition is notationally unpleasant without having any interest¬ 
ing features，so we omit it. □ 

Polynomials are fundamental to the theory of rings，and we must also consider 
polynomials, such as x 2 y 2 + 4x 3 — 3x z y — 4y 2 + 2, in several variables. There is 
no major change in the definitions. 

Let be variables. A monomial is a formal product of these variables, 

of the form 

X\ ll X2 2 -* Xn n , 

where the exponents zV are nonnegative integers. The n-tuple (/i i n ) of exponents 
determines the monomial. Such an «-tuple is called a multi-index, and vector nota¬ 
tion i = (“ ，…， i n ) for multi-indices is very convenient. Using it，we may write the 
monomial symbolically as 

(2.14) X 1 = X\ l X 2 2 Xn n . 

The monomial x° f where 0 = (0 ， ._. ， 0)，is denoted by 1. 

A polynomial with coefficients in a ring /? is a finite linear combination of mo¬ 
nomials, with coefficients in R. Using the shorthand notation (2*14), any polynomial 
f(x) = f(x\ ， ••_ ，々 ) can be written in exactly one way in the form 

(2.15) f(x) = 2 (kx 、 

» 

i 

where / runs through all multi-indices (/】，..•，///)，the coefficients a\ are in /?， and 
only finitely many of these coefficients are different from zero, 

A polynomial which is the product of a monomial by a nonzero element of R is 
also called a monomial. Thus 

(2.17) m = rx l 

is a monomial if r E /? is not zero and if x l is as above (2.14). A monomial can be 
thought of as a polynomial which has exactly one nonzero coefficient. 

Using multi-index notation, formulas (2.10) and (2.12) define addition and 
multiplication of polynomials in several variables, and the analogue of Proposition 
(2.13) is true. 

The ring of polynomials with coefficients in R is denoted by one of the sym¬ 
bols 


( 2 - 16 ) 


or R[x], 
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where the symbol jc is understood to refer to the set of variables (JCi ，，， . ， 知 ). When 
no set of variables has been introduced, /?[jc] refers to the polynomial ring in one 
variable x. 


3. HOMOMORPHISMS AND WEALS 

A homomorphism cp: R - >R r from one ring to another is a map which is compat¬ 

ible with the laws of composition and which carries 1 to 1， that is, a map such that 

(3.1) ip (a + b) = ip (a) + <p(b), <p(ab) = <p{a)<p{b), <p(Ir) = 

for all a y b E R. An isomorphism of rings is a bijective homomorphism. If there is 

an isomorphism R - >R r ， the two rings are said to be isomorphic • 

A word about the third part of (3.1) is in order. The assumption that a homo¬ 
morphism <p is compatible with addition implies that it is a group homomorphism 
- > 尺 '+. We know that a group homomorphism carries the identity to the iden¬ 
tity, so/(0) = 0, But R is not a group with respect to X ， and we can’t conclude that 
<p(l) = 1 from compatibility with multiplication. So the condition <p(l) = 1 must 

be listed separately. For example，the zero map R - >R f sending all elements of R 

to zero is compatible with + and X ， but it doesn’t send 1 to 1 unless 1 = 0 in /?'_ 
The zero map isn’t a ring homomorphism unless R r is the zero ring [see (1.6)]. 

The most important ring homomorphisms are those obtained by evaluating 
polynomials. Evaluation of real polynomials at a real number a defines a homomor¬ 
phism 

(3.2) U[x] - > [R, sending p (jc) 

We can also evaluate real polynomials at a complex number such as /， to obtain a ho¬ 
momorphism 

(3.3) IR[jc] - > C ， sending p (x) 

The general formulation of the principle of evaluation of polynomials is this: 

(3.4) Proposition. Substitution Principle: Let <p: R - be a ring homomor¬ 

phism. 

(a) Given an element a E R\ there is a unique homomorphism <l>: R[x] - >R f 

which agrees with the map ip on constant polynomials and which sends 

(b) More generally, given elements a! ， … ， E ： R f 9 there is a unique homomor¬ 
phism <l>: /? [jci ， ... ， jcJ - >R f from the polynomial ring in n variables to R f , 

which agrees with <p on constant polynomials and which sends for 

V = 1 ， •.. ， /!* 

Proof. With vector notation for indices，the proof of (b) is the same as that of 
(a). Let us denote the image of an element r E ： R in R f by r\ Using the fact that $ 
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is a homomorphism which restricts to on /? and sends to a v , we find that it acts 
on a polynomial/(x) = S rix 1 by sending 

(3.5) X r i x 1 /vvw * > S <p( r d ai = S r i' ai - 

In other words, <I> acts on the coefficients of a polynomial as and it substitutes a 
for x. Since this formula describes O for us，we have proved the uniqueness of the 
substitution homomorphism. To prove its existence, we take this formula as the 

definition of O，and we show that this map is a homomorphism R[x] - >R r . It is 

easy to show that O sends 1 to 1 and that it is compatible with addition of polynomi¬ 
als. Compatibility with multiplication can be checked using formula (2.11); 

0(/g) = <I>(X ciibjx i+j ) = 2 ^(aibjx i+j ) = ^ b/a i+j 

* » 

=(Z b/a j ) = $ ⑺ $(g). □ 

* * 

i J 

Here is an example of the Substitution Principle in which the coefficient ring R 
changes: Let ijj ： R - be a ring homomorphism. Composing ijj with the inclu¬ 
sion of Ri as a subring of R\[x], we obtain a homomorphism <p: R - The 

Substitution Principle asserts that there is a unique extension of to a homomor¬ 
phism O; R[x] - >/?i[x] which sends This is the map which operates on 

the coefficients of a polynomial，leaving the variable x fixed- If we denote if/(a) by 
a f ， then it sends a polynomial a n x n + ••* + a\x + a 0 to a n f x n + + a\ x + a 0 f . 

An important case is the homomorphism /- > f p , where ¥ p = pH the 

field with p elements. This map extends to a homomorphism 

(3.6) Z[x ] —— > F p [x] ， sending 

f(x) = a n x n + •** + a 0 M/w ^a n x n + + a 0 = f(x ), 

where 瓦 denotes the residue class of a/ modulo p. It is natural to call the polynomial 
f(x) the residue off(x) modulo p • 

The Substitution Principle is also an efficient way to prove that various con¬ 
structions of polynomial rings are equivalent; the isomorphism 

R[x,y] ^ R[x][y] 

is a typical example. Here the right side stands for the ring of polynomials in ^ 
whose coefficients are polynomials in jc. The statement that these rings are isomor¬ 
phic is a formalization of the procedure of collecting terms of like degree in y in a 
polynomial/(jc ， ;y)，to write it as a polynomial in y . For example, 

x 2 y 2 + 4x 3 — 3x 2 y — Ay 2 + 2 = (x 2 — 4)y 2 — Ox 2 )y 4- (A-x 3 + 2). 

(3.7) Corollary. Lctx = (xi,... ? x m ) and y — (: Vi，■"，>) denote sets of variables• 

There is a unique isomorphism R[x,y] - which is the identity on R and 

which sends the variables to themseJves. 
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Proof • Note that /? is a subring of R[x] 9 and that R [x] is a subring of /?[^][^]. 

So R is also a subring of/?[ jc][j]. Consider the inclusion map <p: R - >/?[x][j]. The 

Substitution Principle (3.4) tells us that there is a unique homomorphism <!>: 

沢 U，} 7 ] - > R M[^] which extends this map and sends the variables x^,y v wherever 

we like. So we can send the variables to themselves. The map O thus constructed is 
the required isomorphism. We can show that it has an inverse by using the Substitu¬ 
tion Principle once more: We note that R [jc] is a subring of R[x,y], so we can extend 

the inclusion map (j/: /?[x] - to a map 平 : R [x][y] - ^/?[x s y] by sending 

yj to itself. The composed homomorphism 平少： - is the identity 

on R and on By the uniqueness of the substitution homomorphism, 平 is 

the identity map. Similarly, <t>^ is the identity. This proves that O is an isomor¬ 
phism. □ 

Since a real polynomial f(x) can be evaluated at a real number, it defines a 
polynomial function on the real line. The term polynomial is often used to refer to a 
function obtained in this way，and not much danger is involved in doing this ， be¬ 
cause we can recover the polynomial from its function; 

(3.8) Proposition. Let 淡 denote the ring of continuous real-valued functions on 

U n . The map <p ： [R[xi， ，•” jc«] - >淡 sending a polynomial to its associated polyno¬ 

mial function is an injective homomorphism. 

Proof. The existence of this homomorphism follows from the Substitution 
Principle, Let us prove injectivity- It is enough to show that if the function associ¬ 
ated to a polynomial/(x) is the zero function, then/(x) is the zero polynomiaL Let 
the associated function be/(x). If f{x) is identically zero, then all its derivatives are 
zero too. On the other hand, we can differentiate a formal polynomial by using the 
rule for differentiating polynomial functions* If some coefficient of our polynomial / 
is not zero, then the constant term of a suitable derivative will be nonzero too. So 
that derivative will not vanish at the origin. Therefore / ( jc) can’t be the zero func¬ 
tion. □ 

Another important example of a ring homomorphism is the map from the in¬ 
tegers to an arbitrary ring: 

(3.9) Proposition* There is exactly one homomorphism 

cp: Z - >R 

from the ring of integers to an arbitrary ring /?, It is the map defined by <p(n)= 
“n times U” = 1 及 + … + U (rt times) if n >0, and <p(-n) = ~<p{n). 

Sketch of Proof . Let <p ： Z - a homomorphism. By the definition of ho¬ 

momorphism, <p(l) = Ir, and (p(n + 1) = <p(n) + <p(l). So <p is determined on 
the natural numbers by the recursive definition 

<p(l) = 1 and cp(n f ) = <p(n) + 1 ， 
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where ' denotes the successor function ( 2 . 1 b). This formula，together with <p{-n )= 
-ip (n) if n > 0 and <p ( 0 ) = 0, determines <p uniquely. So the above map is the only 
possible one* To give a formal proof that this map is a homomorphism，we must go 
back to Peano’s axioms. Let us verify that <p is compatible with addition of positive 
integers. To prove that <p(m + n) = (p(m) + <p{n), we note that this is true when 
n = 1， by the definition of (p. Assume it true for all m and some particular n. Then 
we prove it for all m and for n f : 

(p(m + n f ) = (p{(m + n) + 1 ) (properties of addition of integers) 

= 9 (m + n) + 1 (definition of (p) 

=<p (m) + («) + 1 (induction hypothesis) 

— <p(m) + <p(n f ) (definition of (p)• 

By induction, <p(m + «) = (p(m) + <p(n) for all m and n. We leave the proof of 
compatibility with multiplication of positive integers as an exercise. □ 

This proposition allows us to identify the images of the integers in an arbitrary 
ring R. Thus we can interpret the symbol 3 as the element 1 + 1 + 1 in /?， and we 
can interpret an integer polynomial such as 3x 2 + 2x as an element of the polyno¬ 
mial ring R[x]. 

We now go back to an arbitrary ring homomorphism ip: R - >R f . The kernel 

of <p is defined in the same way as the kernel of a group homomorphism: 

ker <p = {a E /? | 9 (a) = 0}. 

As you will recall, the kernel of a group homomorphism is a subgroup, and in addi¬ 
tion it is normal [Chapter 2 (4,9)]. Similarly，the kernel of a ring homomorphism is 
closed under the ring operations of addition and multiplication, and it also has a 
stronger property than closure under multiplication: 

(3.10) /f a E ker <p and r E /?, then ra E ker <p. 

For if <p{a) = 0, then ^(ra) = (p(r)(p(a) = (p(r)0 = 0. On the other hand, ker <p 
does not contain the unit element 1 of R, and so the kernel is not a subring ， unless it 
is the whole ring /?. (If 1 E ker <p, then r = rl E ker <p for all r E R.) Moreover, 
if ker (p = R, then <p is the zero map，and by what was said above ? /? / is the zero 
ring. 

For example, let <p be the homomorphism U[x] - > 1R defined by evaluation at 

the real number 2 . Then ker <p is the set of polynomials which have 2 as a root. It 
can also be described as the set of polynomials divisible by x — 2 . 

The property of the kernel of a ring homomorphism 一 that it is closed under 
multiplication by arbitrary elements of the ring — is abstracted in the concept of an 
ideal An ideal / of a ring /? is，by definition, a subset of R with these properties: 

(3.11) 

(i) / is a subgroup of R+; 

(ii) If a E / and r E then ra G L 
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This peculiar term “ideal” is an abbreviation of “ideal element，” which was formerly 
used in number theory. We will see in Chapter 11 how the term arose. Property (ii) 
implies that an ideal is closed under multiplication, but it is stronger, A good way to 
think of properties (i) and (ii) together is this equivalent formulation; 

(3.12) l is not empty ， and a linear combination nfli + … + 

of elements a x E / with coefficients r t E R is in L 

In any ring R, the set of multiples of a particular element a, or equivalently, 
the set of elements divisible by a, forms an ideal called the principal ideal generated 
by a. This ideal will be denoted in one of the following ways: 

(3.13) (a) = aR = Ra = {ra\ r E /?}. 

Thus the kernel of the homomorphism U[x] - defined by evaluation at 2 may 

be denoted by (x — 2) or by (x — 2)[R[x]. Actually the notation (a) for a principal 
ideal, though convenient，is ambiguous because the ring is not mentioned. For 
instance，（x — 2) may stand for an ideal in U[x] or in Z[x] ? depending on the cir¬ 
cumstances. When there are several rings around, a different notation may be 
preferable. 

We may also consider the ideal I generated by a set of elements ai ，…， of /?， 
which is defined to be the smallest ideal containing the elements. It can be described 
as the set of all linear combinations 

(3.14) nai + … + r n a n , 

with coefficients n in the ring. For if an ideal contains 山 ，…， then (3.12) tells us 
that it contains every linear combination of these elements- On the other hand，the 
set of linear combinations is closed under addition, subtraction, and multiplication 
by elements of R. Hence it is the ideal /. This ideal is often denoted by 

(3.15) (a u ... 9 a n ) = {nai + … + r n a n | n E R}. 

For example, if R is the ring I[x] of integer polynomials, the notation (2 ， jc) 
stands for the ideal of linear combinations of 2 and jc with integer polynomial 
coefficients. This ideal can also be described as the set of all integer polynomials 
f(x) whose constant term is divisible by 2. It is the kernel of the homomorphism 

Zfjc] - >Z/2Z defined by f(x) >ww^ {residue #/ ⑼ (modulo 2))* 

For the rest of this section, we will describe ideals in some simple cases. In 
any ring the set consisting of zero alone is an ideal，called the zero ideal. It is ob¬ 
viously a principal ideal，as is the whole ring. Being generated as an ideal by the ele¬ 
ment 1 ，尺 is called the unit ideal ， often denoted by (1). The unit ideal is the only 
ideal which contains a unit. An ideal / is said to be proper if it is not (0) or (1), 
Fields can be characterized by the fact that they have no proper ideals; 

(3.16) Proposition* 

(a) Let F be a field. The only ideals of F are the zero ideal and the unit ideal. 

(b) Conversely, if a ring R has exactly two ideals，then /? is a field. 
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Let us prove (b). Assume that R has exactly two ideals. The properties that distin¬ 
guish fields among rings are that 1=^0 and that every nonzero element a E ： R has a 
multiplicative inverse. As we saw above, 1=0 occurs only in the zero ring，which 
has one element. This ring has only one ideal* Since our ring has two ideals ， 1=^0 
in R. The two ideals (1) and (0) are different, so they are the only two ideals of /?. 

We now show that every nonzero element of R has an inverse. Let a E /? be a 
nonzero element，and consider the principal ideal (a). Then (a) ^ (0) because 
a E (a). Therefore (a) = (1), This implies that 1 is a multiple，say ra 3 of a. The 
equation ar = l shows that a has an inverse ，口 

(3-17) Corollary. Let F be a field and let be a nonzero ring. Every homomor¬ 
phism (p: F - >R f injective. 

Proof. We apply (3,16). If ker <p = (1)，then <p is the zero map. But the zero 
map isn’t a homomorphism because R' isn’t the zero ring. Therefore ker <p = (0 ) .口 

It is also easy to determine the ideals in the ring of integers. 

(3.18) Proposition* Every ideal in the ring Z of integers is a principal ideal. 

This is because every subgroup of the additive group Z+ of integers is of the form 
nZ [Chapter 2 (2*3)], and these subgroups are precisely the principal ideals, a 

The characteristic of a ring R is the nonnegative integer n which generates the 

kernel of the homomorphism <p\ Z - >R (3.9). This means that n is the smallest 

positive integer such that “n times 1 尺 ’’ = 0 or, if the kernel is (0)，the characteristic 
is zero (see Chapter 3, Section 2). Thus R ， C，and Z have characteristic zero, while 
the field Fp with p elements has characteristic p. 

The proof that every ideal of the ring of integers is principal can be adapted to 
show that every ideal in the polynomial ring F[x] is principal* To prove this，we 
need division with remainder for polyomials. 

(3.19) Proposition* Let /? be a ring and let/, g be polynomials in R[x]. Assume 
that the leading coefficient of / is a.unit in R. (This is true，for instance，if / is a 
monic polynomial.) Then there are polynomials q,r E ： R[x] such that 

gW = f(x)q(x) + r(_r )， 

and such that the degree of the remainder r is less than the degree of/or else r = 0. 
This division with remainder can be proved by induction on the degree of g. □ 

Note that when the coefficient ring is a field，the assumption that the leading 
coefficient of/is a unit is satisfied, provided only that there is a leading coefficient ， 
that is, that/ ^ 0. 

(3‘20) Corollary* Let ^(jc) be a monic polynomial in R[x], and let a be an ele¬ 
ment of R such that g(a) = 0. Then x — a divides g in R[x]. □ 
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(3.21) Proposition* Let F be a field. Every ideal in the ring F[x] of polynomials 
in a single variable x is a principal ideal. 

Proof. Let / be an ideal of F[x]. Since the zero ideal is principal，we may as¬ 
sume that / ^ (0). The first step in finding a generator for a nonzero subgroup of Z 
is to choose its smallest positive element. Our substitute here is to choose a nonzero 
polynomial / in / of minimal degree. We claim that I is the principal ideal generated 
by f. It follows from the definition of an ideal that the principal ideal (/) is con¬ 
tained in L To prove that / C (/), we use division with remainder to write 
g = fq + r ，where r has lower degree than/，unless it is zero. Now if g is in the 
ideal /， then since/ E / the definition of an ideal shows that r = g — fq is in I too. 
Since / has minimal degree among nonzero elements，the only possibility is that 
r = 0. Thus/divides g, as required. □ 

The proof of the following corollary is similar to that of (2.6) in Chapter 2. 

(3.22) Corollary. Let F be a field，and let /， 发 be polynomials in F[x] which are 
not both zero. There is a unique monic polymomial d(x) called the greatest common 
divisor off and g, with the following properties: 

(a) d generates the ideal (f ， g)of F[x] generated by the two polynomials /, g. 

(b) d divides/and g. 

(c) If h is any divisor of/and g, then h divides d. 

(d) There are polynomials p，q G F[x] such that d = pf + qg. □ 

4. QUOTIENT RINGS AND RELATIONS IN A RING 

Let 1 be an ideal of a ring /?. The cosets of the additive subgroup /+ of R + are the 
subsets 

a + I, a E ： R. 

It follows from what has been proved for groups that the set of cosets R/I = /? is a 
group under addition. It is also a ring: 

(4.1) Theorem. Let / be an ideal of a ring R. 

(a) There is a unique ring structure on the set of cosets R = R/I such that the 

canonical map tt: R - 争 R sending = a + / is a homomorphism. 

(b) The kernel of tt is /. 

Proof • This proof has already been carried out in the special case that RJis the 
ring of integers (Chapter 2， Section 9). We want to put a ring structure on R with 
the required properties, and if we forget about multiplication and consider only the 
addition law，the proof has already been given [Chapter 2 (10.5)], What is left to do 
is to define multiplication. Let x,y E R, and say that x = a — a + I and y = b = 
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b + L We would like to define the product to be xy — ab = ab + L In contrast 
with coset multiplication in a group [Chapter 2 (10.1)]，the set of products 

P = {r^ I r G a + I, s E 办 + /} 

is not always a coset of L However，as in the case of the ring of integers, the set P is 
always contained in the single coset ab + /: If we write r = a + u and s = b + v 
with u f v E /， then 

(a + u)(b v) = ab + (av + bu + uv), 

and since / is an ideal，au 十办 w + mu E /. This is all that is needed to define the 
product coset: It is the coset which contains the set P. This coset is unique because 
the cosets partition R. The proof of the remaining assertions closely follows the pat¬ 
tern of Chapter 2, Section 9. □ 

As in Chapter 6 (8.4) and Chapter 2 (10.9), one can show the following: 

(4.2) Proposition, Mapping property of quotient rings: Let/: R - be a ring 

homomorphism with kernel 1 and let J be an ideal which is contained in L Denote 
the residue ring R/J by R. 

(a) There is a unique homomorphism/: R - such that/7r = /: 

R _ L _ > R f 

x 

、R = R/J 

(b) First Isomorphism Theorem: If J = I, then / maps R isomorphically to the im¬ 
age of /, □ 

We will now describe the fundamental relationship between ideals in a 
quotient ring R/J and ideals in the original ring R. 

(4.3) Proposition* Correspondence Theorem: Let R = R/J, and let n denote the 

canonical map R - >/?. 

(a) There is a bijective correspondence between the set of ideals of R which con¬ 
tain J and the set of all ideals of R , given by 

and 7 r _1 (/) ^vvw/. 

(b) If I C R corresponds to 7 C R ， then R/I and R/I are isomorphic rings. 

The second part of this proposition is often called the Third Isomorphism Theorem. 
[There is also a Second Isomorphism Theorem (see Chapter 6， miscellaneous exer¬ 
cise 7)]. 
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Proof. To prove (a)，we must check the following points: 

(i) If / is an ideal of R which contains J, then tt (/) is an ideal of R. 

(ii) If 1 is an ideal of R, then 7 t _ 1 (7) is an ideal of/?, 

(iii) 7r _1 (7r(7)) = / and 7r(7r _1 (7)) = L 

We know that the image 一 of a subgroup is a subgroup [Chapter 2 (4,4)]. So to show 
that 7r(/) is an ideal of R, we_need only prove that it is closed under multiplication 
by elements of /?. Let 7 E ： R ， and let x E 7r (/). We write r ~ Tr{r) for some 
r B R, and x = 7r(x) for some x L Then rx = tt (rx) and rx G I. So 
rx E tt( 7). Note that this proof works for all ideals / of /?. We do not need the as¬ 
sumption that / D 7 at this point. However，』he fac^that tt is surjective is essential* 

Next，we denote the homomorphism_^ - >R/j by <p, and we consider the 

composed homomorphism R - >R - 、 R「L Since tt and ip are surjective，so is 

9 ° 77. Moreover, the kernel of 9 ° 7r is the set of elements r e /? such that 

7r(r) G 7 = ker By definition，this is Therefore being the kernel 

of a hojuojxiorphisjii, is an idea] of R. This proves (ii). A 】 so, the First Isomorphism 
Theorem applies to the homomorphism 。 77 and shows that R/ 7 r~ x (I) is isomorphic 
to R/L This proves part (b) of the proposition. 

It remains to prove (iii); remember that 77— 1 isn’t usually a map. The inclusions 
7T -1 (7T(/)) 〕 / and 7r(7r~ l (I)) C I are general properties of any map of sets and 
for arbitrary subsets. Moreover，the equality 7r(7r _1 (/)) = 7 holds for any surjec¬ 
tive map of sets. We omit the verification of these facts. The final point，that 

7r~\7r(l)) C /, is the one which requires that / D 7. Let jc E tt~ x ( 7 t (/)), Then 

tt(x) E tt (/)， so there is an element y E / such that 7 r(y) = 7 r (jc) ，Since 77 is a ho¬ 
momorphism, tt(x - y) = 0 and x - y E / = ker tt. Since y E / and J (Z I ， 
this implies that x E /， as required. □ 

The quotient construction has an important interpretation in terms of relations 
among elements in a ring R. Let us imagine performing a sequence of operations 
+ ， 一 ， x on some elements of R to get a new element a. If the resulting element a is 
zero, we say that the given elements are related by the equation 

(4.4) a^=0. 

For instance, the elements 2 ， 3， 6 of the ring Z are related by the equation 
2 x 3 — 6 = 0. 

Now if the element a is not zero, we may ask whether it is possible to modify 
R in such a way that (4.4) becomes true. We can think of this process as adding a 
new relation, which will collapse the ring. For example，the relation 3 x 4 - 5 = 0 
does not hold in Z，because 3 x 4 — 5 — 7. But we can impose the relation 7 = 0 
on the integers. Doing so amounts to working modulo 7. 

At this point we can forget about the procedure which led us to the particular 
element a; let it be an arbitrary element of R. Now when we modify R to impose the 
relation a = 0, we want to keep the operations + and x, so we will have to accept 
some consequences of this relation. For example，ra = 0 and b + a = b arc the 
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consequences of multiplying and adding given elements to both sides of a = 0. Per¬ 
forming these operations in succession gives us the consequence 

(4.5) b + ra = b. 

If we want to set a = 0， we must also set b + ra = b for all b, r E R. Theorem 
(4.1) tells us that this is enough: There are no other consequences of (4,4). To see 
this, note that if we fix an element b but let r vary, the set {b + ra} is the coset 
b + (a), where (a) = aR is the principal ideal generated by a. Setting b + ra = b 
for all r is the same as equating the elements of this coset. This is precisely what 
happens when we pass from R to the quotient ring R = R/(a). The elements of R 
are the cosets b = b + (a), and the canonical map_7r: R - >R carries all the ele¬ 

ments b ^ ra in one coset to the same element b ~ 7r(b). So exactly the right 
amount of collapsing has taken place in R. Also, a = 0, because ajs an element of 
the ideal (a), which is the kernel of tt. So it is reasonable to view R = R/(a) as the 
ring obtained by introducing the relation a = 0 into R. 

If our element a was obtained from some other elements by a sequence of ring 
operations, as we supposed in (4,4)，then the fact that 7r is a homomorphism implies 
that the same sequence of operations gives 0 in R. Thus if uv + w = a for some 
u, v, w E R y then the relation 

(4.6) IT t; + w = 0 

holds in /?. For, since 7r is a homomorphism ，uv + w = mu 十 w = a = 0. 

A good example of this construction is the relation n = 0 in the ring of in¬ 
tegers Z. The resulting ring is Z/nZ. 

More generally, we can introduce any number of relations a\ — = a n = 0, 

by taking the ideal / generated by 

a 1 ， . _ * , (3,15)，which is the set of linear combi¬ 

nations {nai + "• + r n a n \ ri E /?}. The quotient ring R = R/I should be viewed 
as the ring obtained by introducing the n relations 山 = 0 ,…， = 0 into R. Since 
at G I, the residues at are zero. Two elements b ， b f of R have the same image in R if 
and only if b f - b G I, or b f = 办 + nfli + … + r n a n , for some n G /?. Thus the 
relations 

(4.7) b + r\a\ + *** + r n a n = b 

are the only consequences of fli = ••• = = (X 

It follows from the Third Isomorphism Theorem (4.3b) that introducing rela¬ 
tions one at a time or all together leads to isomorphic results. To be precise, let a, b 
be elements of a ring R, and let R = R/(a) be the result of killing a. Introducing the 
relation b = 0 into the ring R leads to the quotient ring R/(b), and this ring is iso¬ 
morphic to the quotient R/(a, b) obtained by killing a and b at the same time ， be¬ 
cause {a, b) and (b) are corresponding ideals [see (4J)]. 

Note that the more relations we add，the more collapsing takes place in the 

map R - If we add them carelessly, the worst that can happen is that we may 

end up with I = R and R = 0. All relations a = 0 become true when we collapse R 
to the zero ring. 
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The procedure of introducing relations will lead to a new ring in most cases. 
That is why it is so important，But in some simple cases the First Isomorphism The¬ 
orem can be used to relate the ring obtained to a more familiar one. We will work 
out two examples to illustrate this. 

Let/? = Z[z] be the ring of^Gauss integers，and let R be obtained by introduc¬ 
ing the relation 1 + 3 / = 0. So /? = R/l where / is the principal ideal generated by 
1 + 3/. We begin by experimenting with the relation, looking for recognizable con¬ 
sequences .Multiplying -1 = 3/ on both sides by -/,_we obtain i = 3* So / = 3 in 
R. On the other hand, / 2 = -1 in /? ， and_hence in R too. Therefore 3 2 =_-l, or 
10 = 0, in R. Since i = 3 and 10 = 0 in R, it is reasonable to guess that R is iso¬ 
morphic to Z/( 10 ) = Z/ 10 Z. 

( 4 , 8 ) Proposition. The ring Z[/]/(l + 3/) is isomorphic to the ring Z/ 10 Z of in¬ 
tegers modulo 10 . 

Proof • Having made this guess，we can prove it by analyzing the homomor¬ 
phism (p: Z - >R (3.9). By the First Isomorphism Theorem, im ~ Z/(ker (p). 

So if we show that <p is surjective and that ker <p = 10 Z, we will have succeeded. 
Now every element of R is the residue of a Gauss integer a + bi. Since / = 3 in R, 
the residue of a + hi is the same as that of the integer a + 3b. This shows that <p is 
surjective. Next, let n be an element of ker Using the fact that R = /?//, we see 
that n must be in the ideal I, that is, that n is divisible by 1 + 3i in the ring of Gauss 
integers. So we may write « = (a + ^)(1 + 3/) = (a — 3b) + (3a + b)i for some 
integers a ， b. Since n is an integer, 3a + b = 0, or b = -3a, Thus 
n = a(l - 3/)(1 + 3/) = 10a，and this shows that ker <p C 10Z, On the other 
hand，we already saw that 10 E ker (p. So ker <p — 10Z，as required. □ 

Another possible way to identify the quotient R/I is to find ajing R f and a ho¬ 
momorphism (p: R …… > R f whose kernel is /. To illustrate this ， let/? = C[x y y]/(xy). 
Here the fact that xy is a product can be used to find such a map (p. 

(4.10) Proposition. The ring C[x,y]/(xy) is isomorphic to the subring of the 
product ring C[x] x C[y] consisting of the pairs (/?(x), ^(y)) such that p (0)=《（())• 

Proof. We can identify the ring C[x,y]/(y) easily, because the principal ideal 

(y) is the kernel of the substitution homomorphism <p\ C[x, y] - > C[x] sending 

y/vwv^o. By the First Isomorphism Theorem ， C[x 9 y]/(y) — C[x]* Similarly, 
C[x,y]/(x) — C[y]. So it is natural to look at the homomorphism to the product 
ring (p: C[x ， y] —— ^ C[x] x C[y], which is defined by f(x, y) (f(x, 0 ), /( 0 ? y)). 
The kernel of <p is the intersection of the kernels: ker (p = (y) H (x). To be in this 
intersection, a polynomial must be divisible by both y and x. This just means that it 
is divisible by xy. So ker <p = (xy ). By the First Isomorphism Theorem, 
R = C[x, y]/(xy) is isomorphic to the image of the homomorphism 9 • That image is 
the subring described in the statement of the proposition* 口 
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Aside from the First Isomorphism Theorem, there are no general methods for 
identifying a quotient ring, because it will usually not be a familiar ring. The ring 
C[x,y]/(y 2 — jc 3 + jc )， for example, is fundamentally different from any ring we 
have seen up to now. 


5. ADJUNCTION OF ELEMENTS 

In this section we discuss a procedure which is closely related to the introduction of 
relations，that of adding new elements to a ring. Our model for this procedure is the 
construction of the complex field, starting from the real numbers. One obtains C 
from U by adjoining /•，and the construction is completely formal. That is，the imag¬ 
inary number i has no properties other than those forced by the relation 

(5.1) i 2 - -1. 

We are now ready to understand the general principle behind this construction. Let 
us start with an arbitrary ring R, and consider the problem of building a bigger ring 
containing the elements of R and also containing a new element，which we denote 
by a. We will probably want a to satisfy some relations such as (5.1)，for instance. 
A ring R f containing /? as a subring is called a ring extension of R. So we are look¬ 
ing for a suitable extension. 

Sometimes the element a may be available in a ring extension R f that we al¬ 
ready know. In that case，our solution is the subring of R f generated by R and a. 
This subring is denoted by R[a]. We have already described this ring in Section 1 ， 
in the case R = Z and R f = C. The description is no different in general: /?[«] con¬ 
sists of the elements of R f which have polynomial expressions 

r n a n + •_• + na + ro 

with coefficients n in R. But as happens when we first construct C from U, we may 
not yet know an extension containing a. Then we must construct it abstractly. Actu¬ 
ally, we already did this when we constructed the polynomial ring R[x]. 

Note that the polynomial ring R [x] is an extension of R and that it is generated 
by R and x. So the notation R[x] agrees with the one introduced above. Moreover, 
the Substitution Principle (3.4) tells us that the polynomial ring is the universal 
solution to our problem of adjoining a new element，in the following sense: If a is an 

element of any ring extension R f of R ， then there is a unique map R[x] - >R f 

which is the identity on R and which carries jc to a ■ The image of this map will be 
the subring R[a] t 

Let us now consider the question of the relations which we want our new ele¬ 
ment to satisfy. The variable x in the polynomial ring R[x] satisfies no relations ex¬ 
cept those，such as Ox = 0, implied by the ring axioms. This is another way to state 
the universal property of the polynomial ring. We may want some nontrivial rela¬ 
tions. But now that we have the ring R[x] in hand we can add relations to it as we 
like, using the procedure given in Section 4. We introduce relations by using the 
quotient construction on the polynomial ring The fact that R gets replaced by 
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R[x] in the construction complicates things notationally，but aside from this nota- 
tional complication，nothing is different. 

For example，we can construct the complex numbers formally by introducing 
the relation jc 2 + 1 = 0 into the ring of real polynomials R[x] = P. To do so, we 
form the quotient ring P — P/(x 2 + 1). The residue of x becomes our element i. 

Note that the relation x 2 + 1 = 3c 2 + 1 = 0 holds in P, because the map tt: P - >P 

is a homomorphism and because x 2 + 1 E ker 7r. And since 1 is the unit element in 
尸 ， our standard notation for the unit element drops the bar. So P is obtained from 1R 
by adjoining an element x satisfying 3c 2 + 1 = 0. In other words ， 尸 。 C as re¬ 
quired. 

The fact that the quotient 1 R[x]/(jc 2 + 1) is isomorphic to C also follows from 
the First Isomorphism Theorem (4.2b): Substitution (3.4) of i for x defines a surjec¬ 
tive homomorphism <p\ [R[x] - >C，whose kernel is the set of real polynomials 

with i as a root. Now if i is a root of a real polynomial p(x), then -/ is also a root. 
Therefore x — i andx + i both divide p(x). The kernel is the set of real polynomi¬ 
als divisible by (x — i)(x + /) = x 2 + 1， which is the principal ideal (x 2 + 1). By 
the First Isomorphism Theorem, C is isomorphic to R[x]/(x 2 + 1). 

Another simple example of adjunction of an element was used in Section 6 of 
Chapter 8, where a formal infinitesimal element satisfying 

(5.2) 6 2 = 0 

was introduced to compute tangent vectors* An element of a ring R is called 
infinitesimal or nilpotent if some power is zero, and our procedure allows us to ad¬ 
join infinitesimals to a ring. Thus the result of adjoining an element e satisfying 

(5.2) to a ring R is the quotient ring = R[x]/(x 2 ). The residue of x is the 
infinitesimal element e. In this ring, the relation e 2 = 0 reduces all polynomial ex¬ 
pressions in e to degree <2, so the elements of R f have the form a be, with 
a, b E /?. But the multiplication rule [Chapter 8 (6.5)] is different from the rule for 
multiplying complex numbers. 

In general, if we want to adjoin an element a satisfying one or more polyno¬ 
mial relations of the form 

(5.3) f(a) = + … + Cia + c 0 = 0 

to a ring R, the solution is — R[x]/I^ where / is the ideal in R[x] generated by 
the polynomials /(x). If a denotes the residue x of x in \ then 

(5.4) 0 =f(x) = + … + co = 心 + … + F 0 . 

Here ci is the image in 7?' of the constant polynomial a. So a satisfies the relation in 
R f which corresponds to the relation (5.3) in R. The ring obtained in this way will 
often be denoted by 

(5.5) 7?[a] = ring obtained by adjoining a to R. 

Several elements ai ，…， a m can be adjoined by repeating this procedure, or by intro¬ 
ducing the appropriate relations in the polynomial ring R[xi^.. 9 x m ] in m variables 
all at once. 
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One of the most important cases is that the new element a is required to satisfy 
a single monic equation of degree n > 0. Suppose we want the relation/( jc) = 0 5 
where / is the monic polynomial 

(5.6) f{x) — x n + + … + CiX + Co. 

It isn’t difficult to describe the ring R[a] precisely in this special case. 

(5.7) Proposition. Let /? be a ring，and let/(;c) be a monic polynomial of positive 
degree n, with coefficients in R. Let R[a] denote the ring obtained by adjoining an 
element satisfying the relation/(a) = 0. The elements of /?[a] are in bijective cor¬ 
respondence with vectors (r 0 , … ， r n -i) E R n . Such a vector corresponds to the lin¬ 
ear combination 

ro + r } a + r 2 a 2 + + r n -\Ot n ~\ with n E R. 

This proposition says that the powers l ， a ， a 2 , … ，一 1 form a basis for R[a] 
over R. To multiply two such linear combinations in /?[a]，we use polynomial multi¬ 
plication and then divide the product by /• The remainder is the linear combination 
of l ， a ， ... ， a n_1 which represents the product. So although addition in R f depends 
only on the degree, multiplication depends strongly on the particular polynomial /. 

For example, let R f be the result of adjoining an element a to Z satisfying the 
relation a 3 +3a 十 1 = 0 ， So R r = Z[jc]/(jc 3 + 3jc + 1). The elements of R f are 
linear combinations ro+ria+r 2 a 2 , where n are integers. Addition of two linear 
combinations is polynomial addition: (2+a—a 2 ) + (1+a) = 3+2a—a 2 , for in¬ 
stance. To multiply，we compute the product using polynomial multiplication: 
(2+a—a 2 )(l+a) = 2+3a—a 3 . Then we divide by l+3a+a 3 : 2+3a—a 3 = 
(l+3a+a 3 ) (- 1) + (3+6a). Since \+3a+a 3 = 0 in R\ the remainder 3 + 6a is 
the linear combination which represents the product. 

Or let R f be obtained by adjoining an element a to F 5 with the relation 
a 2 — 3 = 0, that is ， R’ = F 5 [jc]/(jc 2 - 3). Here a represents a formal square root 
of 3. The elements of R f are the 25 linear expressions a + ba in a with coefficients 
a ， 办 E F 5 , This ring is a field. To prove this，we verify that every nonzero element 
a + ba of R f is invertible. Note that (a + ba)(a — ba) = a 2 — 3b 2 E F 5 . More¬ 
over, the equation x 2 = 3 has no solution in F 5 , and this implies that a 2 — 3b 2 ^ 0. 
Therefore a 2 — 3b 2 is invertible in F 5 and in R f . This shows that a ~\- ba is invert¬ 
ible too. Its inverse is (a 2 — 3b 2 )~ } (a — ba). 

On the other hand, the same procedure applied to Fn does not yield a field* 
The reason is that x 2 — 3 = (x + 5){x — 5) in Fh[jc]. So if a denotes the residue of 
x in R f = Fh[jc]/(jc 2 - 3) ? then (a + 5)(a - 5) = 0. This can be explained intu¬ 
itively by noting that we constructed R r by adjoining a square root of 3 to Fn when 
that field already contains the two square roots ±5. At first glance, one might expect 
to get Fn back by this procedure. But we haven’t told a whether to be equal to 5 or 
to -5, We’ve only told it that its square is 3. The relation (a + 5)(a — 5) = 0 
reflects this ambiguity. □ 
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Proof of Proposition (5J). Since R[a] is a quotient of the polynomial ring 
R[x] y every element in R[a] is the residue of a polynomial. This means that it can 
be written in the form g(a) for some polynomial g(x) E R[x]. The relation 
f(a) = 0 can be used to replace any polynomial g(a) of degree >nby one of lower 
degree: We perform division with remainder by/(jc) on the polynomial g(x), obtain¬ 
ing an expression of the form g(x) — f{x)q{x) + r(x) (3 J9). Since f(a) = 0, 
g(a) = r(a). Thus every element jS of R[a] can be written as a polynomial in a ， of 
degree < n. 

We now show that the principal ideal generated by f(x) contains no element of 
degree <n ，and therefore that g(a) ^ 0 for every nonzero polynomial g(x) of de¬ 
gree <n. This will imply that the expression of degree <n for an element is 
unique* The principal ideal generated by f(x) is the set of all multiples "/of/. Sup¬ 
pose h(x) = b m x m + … + ^o，with b m ^ 0, Then the highest-degree term of 
h (x)f(x) is b m x m ^ n f and hence hfhas degree m + n ^ rt. This completes the proof 
of the proposition. □ 

It is harder to analyze the structure of the ring obtained by adjoining an ele- 
ment which satisfies a nonmonic polynomial relation. One of the simplest and most 
important cases is obtained by adjoining a multiplicative inverse of an element to a 
ring. If an element a E R has an inverse a , then a satisfies the relation 

(5*8) aa — \ = 0. 

So we can adjoin an inverse by forming the quotient ring/?' = R[x]/(ax — 1). The 
residue of x becomes the inverse a of a. This ring has no basis of the type described 
in Proposition (5.7), but we can compute in it fairly easily because every element of 
R f has the form a k r, where r E R and /: is a nonnegative integer: Say that 
jS = r 0 十 na + … + r n -\Ot n ^\ with ri E /?. Then since aa = 1, we can also 
write p = a n ~ l (roa n ~ l + n〆 -2 + … + r n -\)^ 

One interesting example is that R is a polynomial ring itself, say R = F[t], 
and that we adjoin an inverse to the variable /• Then R f = F [/— 1)* This 
ring identifies naturally with the ring F[t, t~ l ] of Laurent polynomials in t. A Lau¬ 
rent polynomial is a polynomial in t and t ~ 1 of the form 

n 

(5.9) f{t) = = a—+ …+ + ao + a x t + + a n t n . 

—n 

We leave the construction of this isomorphism as an exercise. 

We must now consider a point which we have suppressed in our discussion of 
adjunction of elements: When we adjoin an element a to a ring R and impose some 
relations，will our original /? be a subring of the ring R[a] which we obtain? We 
know that R is contained in the polynomial ring R[x], as the subring of constant 

polynomials，So the restriction of the canonical map tt: /?[x] - = i?[a] 

to constant polynomials gives us a homomorphism 少 : /? - >/?[«], which is the map 

rA^^f considered above. The kernel of the map i/r: R - >/?[a] = R[x]/I is easy 
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to determine in principle. It is the set of constant polynomials in the ideal /: 

(5.10) keri(j = R A /• 

It follows from Proposition (5.7) that i// is injective, and hence that ker 少 = 0, when 
a is required to satisfy one monic equation. But i/r is not always injective. 

For example, we had better not adjoin an inverse of 0 to a ring. From the equa¬ 
tion 0a = 1 we can conclude that 0=1. The zero element is invertible only in the 
zero ring，so if we insist on adjoining an inverse of 0, we must end up with the zero 
ring. 

More generally，let a 9 b be two elements of a ring R whose product ab is zero. 
Then a is not invertible unless b = 0. For, if exists in R ，then 
b = aT l ab — a—= 0. It follows that if a product ab of two elements of a ring R 
is zero, then the procedure of adjoining an inverse of a to R must kill b. This can 
also be seen directly; The ideal of R[x] generated by ax — l contains 
-b(ax — \) = b, which shows that the residue of b in the ring R[x]/(ax — 1) is 
zero. _ _ 

For example ? 2-3 = 0 in the ring Z/(6). If we adjoin 3 _1 to this ring，we must 

kill 2. Killing 2 collapses Z/(6) to Z/(2) = IF 2 . Since 3 = 1 is invertible in F 2? no 
further action is necessary, and R f — (Z/(6))[x]/(3x — 1) IF 2 ， Again，this can 
be checked directly. To do so, we note that the ring R f is isomorphic to 
Z[jc]/(6, 3jc — 1)，and we analyze the two relations 6 = 0 and 3 jc ~ 1 = 0. They 
imply 6 jc = 0 and 6x ~ 2 = 0; hence 2 二 0. Then 2^ = 0 too, and combined with 
3 jc — 1 = 0 ? this implies jc — 1 = 0. Hence the ideal (6, 3 jc — 1) of Z[jc] con¬ 
tains the elements (2 ，jc — 1), On the other hand, 6 and 3x — 1 are in the 
ideal (2,x — 1), So the two ideals are equal，and R f is isomorphic to 
Z[x]/(2,x - 1) ~ !F 2 . 

An element a of a ring is called a zero divisor if there is a nonzero element b 
such that ab = 0. For example, the residue of 3 is a zero divisor in the ring Z/(6), 
The term “zero divisor” is traditional, but it has been poorly chosen, because actu¬ 
ally every a E R divides zero: 0 = aO. 


6. INTEGRAL DOMAINS AND FRACTION FIELDS 


The difference between rings and fields is that nonzero elements of a ring R do not 
necessarily have inverses. In this section we discuss the problem of embedding a 
given ring /? as a subring into a field. We saw in the last section that we can not ad¬ 
join the inverse of a zero divisor without killing some elements. So a ring which 
contains zero divisors can not be embedded into a field. 

(6.1) Definition. An integral domain /? is a nonzero ring having no zero divisors. 
In other words，it has the property that if ab = 0, then a = 0 or 办 = 0， and also 
1 妾 0 in /?. 

For example, any subring of a field is an integral domain. 
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An integral domain satisfies the cancellation law: 

(6.2) If ab = ac and a ^ 0, then b = c. 

For，from ab = ac we can deduce a(b - c) = 0. Then since a # 0, it follows that 
b — c = 0. q 

(6.3) Proposition. Let R be an integral domain* Then the polynomial ring R [a:] is 
an integral domain. 

(6.4) Proposition. An integral domain with finitely many elements is a field. 

We leave the proofs of these propositions as exercises. □ 

(6.5) Theorem. Let R be an integral domain. There exists an embedding of R into 

a field, meaning an injective homomorphism R - >F y where F is a field. 

We could construct the field by adjoining inverses of all nonzero elements of /?, us' 
ing the procedure described in the last section. But in this case it is somewhat sim¬ 
pler to construct F with fractions. Our model is the construction of the rational num¬ 
bers as fractions of integers, and once the idea of using fractions is put forward, the 
construction follows the construction of the rational numbers very closely. 

Let R be an integral domain. A fraction will be a symbol a/b where a,b E R 
and b 手 0. Two fractions a\/bua 2 /b 2 are called equivalent ， a\/bi ^ ai/bi, if 

办 2 = dib\ . 

Let us check transitivity of this relation_the reflexive and symmetric properties are 
clear (see Chapter 2， Section 5), Suppose that ai/bi — ai/b 2 and also that 
ai/b 2 ^ a^/b^. Then a\b 2 = a 2 b\ and aib^ — asbi* Multiply by bs and b\ to obtain 

a\bib^ = aib\bz = ^b 2 b\. 

Cancel b 2 to get a^b\ = (hbv Thus a\/b\ ^ a^/bz. 

field of fractions F of /? is the set of equivalence classes of fractions. As we 
do with rational numbers, we will speak of fractions a\/b\ , 02/^2 as equal elements 
of F if they are equivalent fractions: a\jb\ = < 32 /bi in F means a\b 2 = aib\. Addi¬ 
tion and multiplication of fractions is defined as in arithmetic: 

(a/b)(c/d) = ac/bd, a/b 十 c/d = 

bd 

Here it must be verified that these rules lead to equivalent answers if a/b and cjd are 
replaced by equivalent fractions. Then the axioms for a field must be verified. All of 
these verifications are straightforward exercises. □ 

Notice that R is contained in F, provided that we identify a E R with the frac¬ 
tion a/l because a/l « b/l only if a = ^ - The map is the injective ho¬ 

momorphism referred to in the theorem. 
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As an example, consider the polynomial ring K[x], where K is any field. This 
is an integral domain, and its fraction field is called the field of rational functions in 
x, with coefficients in K. This field is usually denoted by 


( 6 . 6 ) 



equivalence classes of fractions f / g ， where f，g 
are polynomials and g is not the zero polynomial 


* 


If AT = [R，then evaluation of a rational function f(x)/g(x) defines an actual 
function on the real line, wherever g(x) ^ 0. But as with polynomials，we should 
distinguish between the formally defined rational functions, which are fractions of 
polynomials, and the actual functions which they define by evaluation. 

The fraction field is a universal solution to the problem of embedding an inte¬ 
gral domain into a field. This is shown by the following proposition: 


(6.7) Proposition, Let R be an integral domain, with field of fractions F, and let 
<p: R - >Kb& any injective homomorphism of /? to a field K. Then the rule 

^(a/b) = (p(a)(p(b)~ l 

defines the unique extension of to a homomorphism O: F - >K. 

Proof • We must check that this extension is well defined. First，since the de¬ 
nominator of a fraction is not allowed to be zero and since (p is injective, (p(b) ^ 0 
for any fraction a/b. Therefore (p(b) is invertible in K, and (p(a)(p(b)~ l is an ele¬ 
ment of K. Next，we check that equivalent fractions have the same image: If 
ai/bi ^ ajb u then aibi = a\bi\ hence (p(a 2 )(p(bi) = (p(a } )(p(b 2 ), and 
O(( 32 /^ 2 ) = (p(a 2 )(p(b 2 )~ l = (p(a\)(p(b\y l - ^(ai/b\), as required. The facts that 
中 is a homomorphism and that it is the unique extension of (p follow easily, □ 


Z MAXIMALWEALS 


In this section we investigate surjective homomorphisms 

(7.1) <p:R — >F 

from a ring R to a field F. Given such a homomorphism, the First Isomorphism The¬ 
orem tells us that F is isomorphic to R/k&r (p. Therefore we can recover F and (p，up 
to isomorphism, from the kernel. To classify such homomorphisms, we must deter¬ 
mine the ideals M such that R/M is a field- _ 

By the Correspondence Theorem (4.3)，the ideals ofR = R/M correspond to 
ideals of R which contain M. Also, fields are characterized by the property of having 
exactly two ideals (3.16). So if /? is a field，there are exactly two ideals containing 
M，namely M and R. Such an ideal is called maximal. 

(7.2) Definition, An ideal M is maximal if M R but M is not contained in any 
ideals other than M and R. 
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(7.3) Corollary. 

(a) An ideal M of a ring R is maximal if and only if R = R/M is sl field* 

(b) The zero ideal of R is maximal if and only if /? is a field. □ 

The next proposition follows from the fact that all ideals of Z are principal: 

(7.4) Proposition. The maximal ideals of the ring Z of integers are the principal 
ideals generated by prime integers. □ 

The maximal ideals of the ring C[jc] of complex polynomials in one variable 
can also be described very simply: 

(7.5) Proposition. The maximal ideals of the polynomial ring C[jc] are the princi¬ 

pal ideals generated by the linear polynomials x — a. The ideal M a generated by 
义 一 a is the kernel of the substitution homomorphism s a ： C[^] - >C which sends 

Thus there is a bijective correspondence between maximal ideals M a 
and complex numbers a. 

Proof . We first show that every maximal ideal is generated by a linear polyno¬ 
mial x — a. Let M be maximal. By Proposition (3.21), M is a principal ideal, gener¬ 
ated by the monic polynomial / E M of least degree. Since every complex polyno¬ 
mial of positive degree has a root, / is divisible by some linear polynomial x — a. 
Then / is in the principal ideal (x — a), and hence M C (x - a). Since M is maxi¬ 
mal, M = (x — a). 

Next, we show that the kernel of the substitution homomorphism s a is gener¬ 
ated byjc — a: To say that a polynomial g is in the kernel of s a means that a is a root 
of g ， or that x — a divides g. Thus x — a generates ker s a . Since the image of s a is 
a field，this also shows that (x — a) is a maximal ideal. □ 

The extension of Proposition (7-5) to several variables is one of the most im¬ 
portant theorems about polynomial rings. 

(7.6) Theorem. Hilbert’s Nullstellensatz: The maximal ideals of the polynomial 
ring C[x u ...,x n ] are in bijective correspondence with points of complex n- 
dimensional space, A point a = (fli, …， ％) in C n corresponds to the kernel of the 

substitution map s a : <C[Xi ，-.• ， x n ] - >C, which sends f(x) /vvw ^/(a) * The kernel M a 

of this map is the ideal generated by the linear polynomials 

^1 — ，， ••， 一 (Xn* 

Proof • Let a E C n ， and let M a be the kernel of the substitution map s a . Since 
s a is surjective and C is a field ， M a is a maximal ideal. Next, let us verify that M a is 
generated by the linear polynomials, as asserted To do so, we expand/(x) in pow¬ 
ers of x\ - ⑴， …， ;c n — a n ， writing 

f(x) = f(a) + 2 a( x i ~ «/) + 2 Cij{xi - ai){xj _ 屮 ） + … • 
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You may recognize this as Taylor’s expansion: ct = df/dx“ and so on. The exis¬ 
tence of such an expansion can be derived algebraically by substituting x = a u 
into/, expanding in powers of the variables w, and then substituting u = x — a back 
into the result. Note that every term on the right side except/(a) is divisible by at 

least one of the polynomials (xi — at). So if / is in the kernel of s a , that is, if 

/(a) = 0 ， then/( jc) is in the ideal which these elements generate. This shows that 
the polynomials jc, — at generate M a . 

It is harder to prove that every maximal ideal is of the form M a for some point 
a E C n . To do so, let M be any maximal ideal, and let K denote the field 

C[jci ，...，We consider the restriction of the canonical map (4.1) 

7r: ； c n ] - > K to the subring C[jci] of polynomials in one variable: 

7Ti ： C[jci] - > K. 

(7.7) Lemma. The kernel of tt\ is either zero or else it is a maximal ideal. 

Proof. Assume that the kernel is not zero，and let/be a nonzero element in 
ker 7Ti. Since K is not the zero ring, ker tt\ is not the whole ring. So/is not con¬ 
stant, which implies that it is divisible by a linear polynomial，say / = (x\ — a\)g. 
Then — ai)TTi(g) — m(f) = 0 in K. Since K is a field, 7 Ti(jci — ai) = 0 or 
7Ti(g) = 0. So one of the two elements — fli or g is in ker tt\ . By induction on 
the degree of/，ker 7Ti contains a linear polynomial. Hence it is a maximal ideal 
(7.5), □ 

We are going to show that ker tt\ is not the zero ideal. It will follow that M 
contains a linear polynomial of the form xi — a { . Since the index 1 can be replaced 
by any other index, M contains polynomials of the form x v — a v for every 
v = 1，•••，《. This will show that M is contained in，and hence equal to, the kernel 
of a substitution map/(x) a ^ vvv ^/(a) ? as claimed. 

So，suppose ker 7Ti = (0). Then tt\ maps C[jci] isomorphically to its image, 
which is a subring of K. According to Proposition (6/1 )， this map can be extended to 
the field of fractions of C[x]. Hence K contains a field isomorphic to the field of ra¬ 
tional functions C(jc) [see (3.17)]. 

Now the monomials jc' = xi ； 1 x 2 ,2 x n ln form a basis of …， as a vec¬ 
tor space over C (see Section 2 )， Thus C[jti ， …， ;c n ] has a countable basis (Appendix, 
Section 1). Since 尺 is a quotient of <C[xi ， … ， x n ]，there is a countable family which 
spans K as vector space over C，namely the residues of the monomials span this 
field. We will show that there are uncountably many linearly independent elements 
in C(jc). It will follow [Lemma (7.9)] that C(jc) can not be isomorphic to a subspace 
of K, This contradiction will show ker 7T\ ^ (0), 

The fact we need is that the elements of the complex field C do not form a 
countable set [Appendix (1.7)]. Using this fact，the following two lemmas will finish 
the proof: 

(7.8) Lemma* The uncountably many rational functions (jc — a)' 1 , a E C，are 
linearly independent. 
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Proof. A rational function f/g defines an actual function by evaluation, at all 

points of the complex plane at which g # 0. The rational function (jc — a) -1 has a 
pole at a , which means that it takes on arbitrarily large values near a , It is bounded 
near any other point. Consider a linear combination 

n 

x — at 

where o;! ，…， are distinct complex numbers and where some coefficient, say ci ， is 
not zero. The first term of this sum is unbounded near but the others are 
bounded there• It follows that the linear combination does not define the zero func¬ 
tion; hence it is not zero. □ 


(7.9) Lemma. Let K be a vector space which is spanned by a countable family 
{ui ， U 2 ， .. •} of vectors. Then every set L of linearly independent vectors in V is finite 
or countably infinite- 

Proof • Let L be a linearly independent subset of V y let V n be the span of the 
first n vectors ui，，.” and let L n = L H V n . Then L n is a linearly independent set 
in a finite-dimensional space V n , hence it is a finite set [Chapter 3 (3,16)]. More¬ 
over, L is the union of all the L n 7 s. The union of countably many finite sets is finite 
or countably infinite, □ 


8. ALGEBRAIC GEOMETRY 

To me algebraic geometry is algebra with a kick. 

Solomon Lefschetz 

Let V be a subset of complex n- space C n . If V can be defined as the set of common 
zeros of a finite number of polynomials in n variables, then it is called an algebraic 
variety, or just a variety for short. (I don’t know the origin of this unattractive 
term.) For instance, a complex line in C 2 is, by definition, the set of solutions of a 
linear equation ax + by + c = 0. This is a variety. So is a point. The point (a, b) is 
the set of common zeros of the two polynomials x — a and y — b. We have seen a 
number of other interesting varieties already. The group SL 2 (C), for example, being 
the locus of solutions of the polynomial equation xnx 2 2 — xnx 2 \ — 1 = 0, is a vari¬ 
ety in C 4 . 

Hilbert’s Nullstellensatz provides us with an important link between algebra 
and geometry* It tells us that the maximal ideals in the polynomial ring 
C[jc] = C[jci ， …， ;c ； i] correspond to points in C n . This correspondence can also be 
used to relate algebraic varieties to quotient rings of the polynomial ring. 


(8.1) Theorem. Let/1， …， / r be polynomials in C[Xi ， … ， ; c n ]，and let V be the va¬ 
riety defined by the system of equations/i(x) = 0 f ^. y f r {x) = 0. Let I be the ideal 
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[f u … ， fr) generated by the given polynomials. The maximal ideals of the quotient 
ring R = C[.v]// are in bijective correspondence with points of V. 

Proof • The maximal ideals of R correspond to those maximal ideals of C[x] 
which contain I [Correspondence Theorem (4.3)]. And an ideal will contain I if and 
only if it contains the generators/i，•. of L On the other hand，the maximal ideal 
M a which corresponds to a point a E ： C n is the kernel of the substitution map 
/( 又 ) _ Sofi E M a if and only if fi(a) = 0, which means that a S V. n 

This theorem shows that the algebraic properties of the ring R are closely con¬ 
nected with the geometry of V. In principle, all properties of the system of polyno¬ 
mial equations 

(8.2) fi(x) = ... = fr(x) = 0 

are reflected in the structure of the ring/? = C[jc]/ (/i ， ... ， /r) . The theory of this re- 
Jationship is the field of mathematics caJIed algebraic geometry. We won’t take the 
time to go very far into it here. The important thing for us to learn is that geometric 
properties of the variety provide information about the ring，and conversely. 

The simplest question about a set is whether or not it is empty. So we might 
ask whether it is possible for a ring to have no maximal ideals at all. It turns out that 
this happens only for the zero ring: 

(83) Theorem. Let /? be a ring. Every ideal I of R which is not the unit ideal is 
contained in a maximal ideal. 


(8.4) Corollary. The only ring R having no maximal ideals is the zero ring. □ 

Theorem (8.3) can be proved using the Axiom of Choice, or Zorn’s Lemma, 
However，for quotients of polynomial rings it is a consequence of the Hilbert Basis 
Theorem，which we will prove later [Chapter 12 (5.18)]. Rather than enter into a 
discussion of the Axiom of Choice，we will defer further discussion of the proof to 
Chapter 12. 

If we put Theorems (8 J) and (83) together, we obtain another important cor¬ 
ollary: 

(8.5) Corollary. Let/i， …， / r be polynomials in C[;ri ， … ， ;c n ]. If the system of 

equations /i = = f r = 0 has no solution in C rt ， then 1 is a linear combination 

1 = E 8ifi 

of the/，with polynomial coefficients. 

For, if the system has no solution，then Theorem (8.1) tells us that there is no maxi¬ 
mal ideal containing the ideal I - (/i， …， /"), By Theorem (8.3), I is the unit 
ideal. □ 
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Most choices of three polynomials /i ， / 2 ， / 3 in two variables x, y have no com¬ 
mon solutions. If follows that we can usually express 1 as a linear combination 
1 = p\f\ + pifi + P 3 / 3 , where pi are polynomials. This is not obvious. For in¬ 
stance, the ideal generated by 

(8.6) f\ = X 2 + y 2 - 1, f 2 = x 2 - y + \, fi = xy - \ 

is the unit ideal. This can be proved by showing that the set of equations 
/1 = / 2 = /3 = 0 has no solution in C 2 , If we didn’t have the Nullstellensatz, it 
might take us some time to discover that we could write 1 as a linear combination, 
with polynomial coefficients，of these three polynomials. 

The Nullstellensatz has been reformulated in many ways，and actually the one 
we gave in the last section is not its original form. Here is the original: 

(8.7) Theorem. Classical form of the Nullstellensatz: Let/i，...，/ r and g be poly¬ 
nomials in Let V be the variety of zeros of/i，...，/ r ，and let I be the 

ideal generated by these polynomials. If g = 0 identically on V, then some power of 
g is in the ideal L 

Proof. To prove this we study the ring obtained by inverting the polynomial 
g，by means of the equation gy = \. Assume that g vanishes identically on V. Con¬ 
sider the r + 1 polynomials /1 (x),..., f r (x ), g(x)y - 1 in the variables xu...,x n ,y. 
The last is the only polynomial which involves the variable y. Notice that these poly¬ 
nomials have no common zero in C n+1 . For，if / 1 ，…， / r vanish at a point 
(au...,a n ,b) E C rt+1 , then by hypothesis g vanishes too, and hence gy — I takes 
the value -1. Corollary (8.5) applies and tells us that the polynomials 
f\”" ， f r ，gy — l generate the unit ideal in So we may write 

1 = E Pi(x,y)fi(x,y) + g(x 9 y)(g(x)y - 1 ). 

i 

We substitute y = \/g into this equation, obtaining 

1 = ^ Pi(x,g~ l )fi(x). 

i 

We now clear denominators in pi{x, 发 _1 )，multiplying both sides of the equation by a 
sufficiently large power of g . This yields the required polynomial expression 

g(x) N = E hi{x)fi{x), 

l 

where hi(x) = g(x) N pi(x, g' 1 ). n 

It is not easy to get a good feeling for a general algebraic variety in C n , but the 
general shape of a variety in C 2 can be described fairly simply. 

( 8 . 8 ) Proposition. Two nonzero polynomials/(x ? }?) 5 g(x,y) in two variables have 
only finitely many common zeros, unless they have a nonconstant polynomial factor 

in common. 
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If the degrees of / and g are m and n respectively, the number of common ze¬ 
ros is bounded by mn. This is known as the Bezout bound. For instance 5 two conics 
intersect in at most four points. It is somewhat harder to prove the Bezout bound 
than just the finiteness, and we won’t give a proof. 

Proof of Proposition (8.8). We assume that / and g have no common noncon- 
stant factor. Let F denote the field of rational functions in x, the field of fractions of 
the ring C[jc] . It is useful to regard / and g as elements of the polynomial ring F [y] 
in one variable, because we can use the fact that every ideal of F [y] is principal. Let 
I denote the ideal generated by /，贫 in F[y]. This is a principal ideal, generated by 
the greatest common divisor h of / and g in F[y] (3.22). If / and g have no common 
nonconstant factor in F[y], then I is the unit ideal. 

Our assumption is that/and g have no common factor in C[jc, y], not that they 
have no common factor in F[y], so we need to relate these two properties. Factoring 
polynomials is one of the topics of the next chapter, so we state the fact which we 
need here and defer the proof (see Chapter 11 (3.9)). 

(8.9) Lemma. Let/, g E C[x, y], and let F be the field of rational functions in jc. 
If/and g have a common factor in F [y] which is not an element of F, then they have 
a common nonconstant factor in C[x,y]. 

We return to the proof of the proposition. Since our two polynomials /, g have no 
common factor in C[jc ， y]，they are relatively prime in F[y], so the ideal I they gen¬ 
erate in F[y] is the unit ideal. We may therefore write 1 = rf + sg, where r,s are 
elements of F[y] t Then r,s have denominators which are polynomials in x alone, 
and we may clear these denominators, multiplying both sides of the equation by a 
suitable polynomial p(x). This results in an equation of the form 

p(x) = u(x,y)f(x,y) + v(x,y)g(x,y), 

where u 9 v S C[x,y]. It follows from this equation that a common zero of/and g 
must also be a zero of p. But /? is a polynomial in x alone, and a polynomial in one 
variable has only finitely many roots. So the variable x takes on only finitely many 
values at the common zeros of/, g. The same thing is true of the variable y. It fol¬ 
lows that the common zeros form a finite set. □ 


This proposition shows that the most interesting varieties in C 2 are those which 
are defined as the zeros of a single polynomial These loci are called alge¬ 

braic curves, or Riemann surfaces ， and their geometry can be quite subtle, A 
Riemann surface is two-dimensional, so calling it an algebraic curve would seem to 
be a misnomer. This use of the term curve refers to the fact that such a locus can be 
described analytically by one complex parameter，near a point. 

A rough description of such a variety, when / is irreducible ， follows* (A poly¬ 
nomial is called irreducible if it is not the product of two nonconstant polynomials,) 
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We regard/(j ： ,y) as a polynomial in y whose coefficients are polynomials in x，say 

(8.10) f(x,y) = + … + ui(x)y + u 0 (jc), 

with Ui(x) E C[x]. 

( 8 . 11 ) Proposition. Let f(x,y) be an irreducible polynomial in C[x y y] which is 
not a polynomial in x alone，and let S be the locus of zeros of/in C 2 . Let n denote 
the degree of /， as a polynomial in 

(a) For every value a of the variable x, there are at most n points of S whose x- 
coordinate is a. 

(b) There is a finite set A of values of x such that if a E A then there are exactly n 
points of S whose x-coordinate is a. 

Proof • Let a E C，and consider the polynomial /(a ? y). The points (a,b) E S 
are those such that 厶 is a root of/(fl ， y)_ This polynomial is not identically zero, be¬ 
cause if it were，then x — a would divide each of the coefficients and hence it 
would divide/. But / is assumed to be irreducible. Next，the degree off(a,y) in y is 
at most n, and so it has at most n roots. It will have fewer than n roots if either 

( 8 . 12 ) 

(i) The degree off(a,y) is less than n, or 

(ii) the degree off(a,y) is n, but this polynomial has a multiple root. 

Case (i) occurs when the leading coefficient u n (x) vanishes at a, that is ? when a 
is a root of u n (x). Since u n is a polynomial in x, there are finitely many such values * 
Now a complex number ^ is a multiple root of a polynomial h (y) [meaning that 
(y — b) 2 divides h(y)] if and only if it is a root of h(y) and of its derivative h f (y). 
The proof of this fact is left as an exercise. In our situation, h(y) = f{a ， y). The first 
variable is fixed, so the derivative is the partial derivative with respect to y. Thus 
case (ii) occurs at points (a, b) which are common zeros of / and df/dy. Note that/ 
does not divide the partial derivative df/dy ， because the degree of the partial 
derivative in y is n — 1, which is less than the degree of/in y. Since/is assumed to 
be irreducible，/and df/dy have no nonconstant factor in common. Proposition ( 8 . 8 ) 
tells us that there are finitely many common zeros. □ 

Proposition (8.11) can be summed up by saying that S is an w-sheeted covering 
of the complex jc- plane P. Since there is a finite set A above which S has fewer than 
n sheets，it is called a branched covering. For example, consider the locus 
x 2 + xy 2 — 1 =0. This equation has two solutions 3 ； for every value of x except 
jc = 0, ± 1 • There is no solution with jc = 0， and there is only one with jc = 1 or 
- 1 . So this locus is a branched double covering of P. 

Here is the precise definition of a branched covering: 
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(8*13) Definition. An n-sheeted branched covering of the complex plane P is a to¬ 
pological space S together with a continuous map 7r: S - > P ， such that 

(a) 7 r is w-to-one on the complement of a finite set A in P. 

(b) For every point jc 0 E F — △，there is an open neighborhood U of x, so that 

is made up of n disconnected parts W U … U V n ), each Vi 

is open in S, and tt maps Vi homeomorphically to U. 



V 3 


争 










u 


(8.14) Figure. Part of an … sheeted covering. 

(8.15) Corollary. Let/(jc ， y) be an irreducible polynomial in C[x,y] which has 
degree /i > 0 in the variable y. The Riemann surface of f(x,y) is an «-sheeted 
branched covering of the plane. 

Proof- The fact that the Riemann surface S of f has the first property of a 
branched covering is Proposition (8.11), So it remains to verify property (8.13b), 
Consider a point jc 0 at which/(jc 0? y) has n roots Then (df/dy)(xo 9 y\) ^ 0 

because yi is not a multiple root of f(xo,yi). The Implicit Function Theorem 
[Appendix (4.1)] applies and tells us that equation (8.2) can be solved fory = ai(x) 
as a continuous function of x in some neighborhood U of jc 0 , in such a way that 
y\ = ai(jc 0 )- Similarly, we can solve for y = ai(x) such that yt = a/(x 0 ). Cutting 
down the size of U, we may assume that each ai(x) is defined on U. Since y\,^^y n 
are all distinct and the a/(x) are continuous functions, they have no common values 
provided U is made sufficiently small. 

Consider the graphs of the n continuous functions ai ： 

(8.16) Vi = I jc E i/}. 

They are disjoint because the ai(x) have no common values on IL The map 

Vi - > t/ is a homeomorphism because it has the continuous inverse function 

U /w ^Vi. The inverse sends And 

tt-\U) = ViU -- U V n 

because S has at most n points above any jc ，and the n points have been exhibited as 
(jc, «i(jc)) E Vi. Each of the sets Vi is closed in t/ x C, because it is the set of zeros 
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of the continuous function y—ai(x). Then Vi is also closed in the subset of 

x C. It follows that V\ is open in because it is the complement of the 

closed set V2U."UW Since U is open in C, its inverse image is open in S. 

Thus Vi is open in an open subset of S, which shows that V\ is open in S too. Simi¬ 
larly, Vi is open for each i t □ 

We will look at these loci again in Chapter 13. 


In helping geometry, modern algebra is helping itself above all. 

Oscar Zariski 


EXERCISES 


L DeBnition of a Ring 

1. Prove the following identities in an arbitrary ring /?, 

(a) 0a = 0 (b) -a = (—l)a (c) {-d)b — -(ab) 

2. Describe explicitly the smallest subring of the complex numbers which contains the real 
cube root of 2* 

3. Let a = 5 /. Prove that the elements of Z[a] form a dense subset of the complex plane. 

4. Prove that 7 + ^2 and V3 + V-5 are algebraic numbers. 

5. Prove that for all integers n, cos(27r/n) is an algebraic number. 

6 . Let Q[a,/3] denote the smallest subring of C containing Q, a = V2 ? and /3 = V5, 
and let 7 = a + jS. Prove that Q[a,j 8 ] = Q[y], 

7. Let 5 be a subring of U which is a discrete set in the sense of Chapter 5 (4.3). Prove that 
5 = Z. 


8 . In each case，decide whether or not 5 is a subring of R. 

(a) S is the set of all rational numbers of the form a/b, where b is not divisible by 3, and 
R = Q. 

(b) S is the set of functions which are linear combinations of the functions 

{1, cos nt y sin nt n E Z}, and R is the set of all functions U - > R. 


(c) (not commutative) S is the set of real matrices of the form 
of all real 2 x 2 matrices. 


a 

-b 


b 

a 


， and R is the set 


9. In each case，decide whether the given structure forms a ring. If it is not a ring ， deter¬ 
mine which of the ring axioms hold and which fail: 

(a) U is an arbitrary set, and R is the set of subsets of U. Addition and multiplication of 
elements of R are defined by the rules A + B = A U B and A * B = A H B. 

(b) U is an arbitrary set，and R is the set of subsets of U. Addition and multiplication of 
elements of R are defined by the rules A + B = (A U B) — (A Pi B) and 
A * B — A H B. 

(c) R is the set of continuous functions U — —> R. Addition and multiplication are 
defined by the rules [/ + g](x) = f(x) + g {x) and [/og](x) - f(g(x)). 


10. Determine all rings which contain the zero ring as a subring* 
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11 • Describe the group of units in each ring. 

(a) Z/12Z (b) Z/7Z (c) Z/8Z (d) Z/nZ 

12. Prove that the units in the ring of Gauss integers are {±1 ， ±十 

13. An element x of a ring R is called nilpotent if some power of x is zero. Prove that if x is 
nilpotent，then 1 + a: is a unit in R. 

14. Prove that the product set Rx R r of two rings is a ring with component-wise addition 
and multiplication: 

{a 7 a f ) + (b ， b f ) = (a + b 7 a r + b r ) and (a ， a’)(b ， b r 、 = {ab.a'b'). 

This ring is called the product ring. 

2. Formal Construction of Integers and Polynomials 

1. Prove that every natural number n except 1 has the form m r for some natural number m, 

2. Prove the following laws for the natural numbers. 

(a) the commutative law for addition 

(b) the associative law for multiplication 

(c) the distributive law 

(d) the cancellation law for addition: if a + = a + c, then b - c 

(e) the cancellation law for multiplication: if ab = ac ， then b = c 

3. The relation < on can be defined by the rule a < bifb = a + n for some n. Assume 
that the elementary properties of addition have been proved. 

(a) Prove that if a < b, then a + n < b n for all n, 

(b) Prove that the relation < is transitive. 

(c) Prove that if a, b are natural numbers，then precisely one of the following holds: 

a < b, a = b, b < cl 

(d) Prove that if n ^ 1, then a < an. 

4. Prove the principle of complete induction: Let 5 be a subset of f^J with the following 
property: If n is a natural number such that m E S for every m < n ， then n E S t Then 

S = fU 

*5. Define the set Z of all integers, using two copies of I\J and an element representing zero, 
define addition and multiplication, and derive the fact that Z is a ring from the properties 
of addition and multiplication of natural numbers. 

6 . Let /? be a ring. The set of all formal power series p{t) = a 0 + a { t + a 2 t 2 + … ， with 
at E R, forms a ring which is usually denoted by R[[t]] t (By formal power series we 
mean that there is no requirement of convergence.) 

(a) Prove that the formal power series form a ring, 

(b) Prove that a power series p(t) is invertible if and only if a 0 is a unit of R. 

7* Prove that the units of the polynomial ring U[x] are the nonzero constant polynomials. 

3. Homomorphisms and Ideals 

1. Show that the inverse of a ring isomorphism <p: R - is an isomorphism. 

2. Prove or disprove: If an ideal I contains a unit, then it is the unit ideal. 

3* For which integers n does x 2 x + 1 divide x 4 + 3x 3 + x 2 + 6x + 10 in Z/nZ[x]? 
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4. Prove that in the ring Z[;c] ，（ 2 ) fl (x) = (2x). 

5. Prove the equivalence of the two definitions (3.11) and (3.12) of an ideal. 

6 . Is the set of polynomials a n x n + a n -ix n ~ l + + a x x + ao such that 2 k+{ divides au 

an ideal in Z[x]7 

7. Prove that every nonzero ideal in the ring of Gauss integers contains a nonzero integer. 

8 . Describe the kernel of the following maps. 

(a) !R[x ， y] - > U defined by/(x, j) 0 ) 

(b) U[x] - > C defomed by + i) 

9. Describe the kernel of the map Z[x] - > 1 R defined H- V 2 ). 

10. Describe the kernel of the homomorphism <p: C[x,y 7 z] - >€[t] defined by <p(jc) = t, 

<p(y) = t 2 , (p{z) = t\ 

11. (a) Prove that the kernel of the homomorphism <p: C[x,y] - >C[/] defined by 

jc/ww^f 2 , y/wwM ^ 3 is the principal ideal generated by the polynomial y 2 — x 3 . 

(b) Determine the image of (p explicitly. 

12. Prove the existence of the homomorphism (3.8), 

13* State and prove an analogue of (3,8) when R is replaced by an arbitrary infinite field. 

14. Prove that if two rings R, R r are isomorphic, so are the polynomial rings R[x] and 
R f [xl 

15. Let /? be a ring, and Iet/(j) E R[y] be a polynomial in one variable with coefficients in 

R t Prove that the map/?[u] - >/?[jc ， ;y] defined by + f{y), is an au¬ 

tomorphism of R[x,y]. 

16. Prove that a polynomial f(x) = S atx 1 can be expanded in powers of x — a: 
f(x) = Sc/(jc — a ) 1 ， and that the coefficients ct are polynomials in the coefficients a“ 
with integer coefficients. 

17. Let R,R f be rings，and let x J ? 7 be their product* Which of the following maps are 
ring homomorphisms? 

(a) R — >Rxr\ r /vvvv ^ (r, 0 ) 

(b) R — — RxR ， (r, r) 

(c) RxR f — >R ，(n ? r 2 )^ 

(d) 7? X /? - >R, , r 2 ) /vwv ^r i r 2 

(e) Rx R - >/?， (n ? r 2 ) AA/w% *ri + r 2 

18. (a) Is Z/(10) isomorphic to Z / ⑵ x Z/(5)? 

(b) Is Z/( 8 ) isomorphic to Z/( 2 ) x Z/(4)? 

19. Let /? be a ring of characteristic p. Prove that the map R - >R defined by x^^x p is a 

ring homomorphism. This map is called the Frobenius homomorphism• 

20* Determine all automorphisms of the ring Z[jc]. 

21. Prove that the map Z - 今 R (3.9) is compatible with multiplication of positive integers. 

22* Prove that the characteristic of a field is either zero or a prime integer. 

23. Let /? be a ring of characteristic p. Prove that if a is nilpotent then 1 + a is unipotent , 
that is，some power of 1 + a is equal to 1 . 

24. (a) The nilradical N of a ring R is the set of its nilpotent elements. Prove that N is an 

ideal. 

(b) Determine the nilradicals of the rings Z/( 12 ) ， Z/(n)，and Z. 

25 . ⑻ Prove Corollary (3.20). 

(b) Prove Corollary (3.22). 
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26* Determine all ideals of the ring of formal power series with real coefficients. 

27. Find an ideal in the polynomial ring F[x,y] in two variables which is not principal. 

*28. Let /? be a ring，and let I be an ideal of the polynomial ring /?[x]. Suppose that the lowest 
degree of a nonzero element of / is n and that / contains a monic polynomial of degree n. 
Prove that / is a principal ideal. 

29. Let I, J be ideals of a ring R. Show by example that I U J need not be an ideal, but 
show that I + J = {r SR\r = x + y y with x E f，y E J} is an ideal. This ideal is 
called the sum of the ideals /, J. 

30. (a) Let I, J be ideals of a ring R. Prove that I H / is an ideal. 

(b) Show by example that the set of products {xy \ x E / 3 y E J} need not be an ideal ， 
but that the set of finite sums S x v y v of products of elements of I and J is an ideal. 
This ideal is called the product ideal• 

(c) Prove that IJ (Z I D J. 

(d) Show by example that IJ and I D J need not be equal. 

31* Let be ideals in a ring R. Is it true that I(J + J f ) = IJ + IJ 

*32. If /? is a noncommutative ring，the definition of an ideal is a set / which is closed under 
addition and such that if r E /? and jc E /, then both rx and xr are in /. Show that the 
noncommutative ring of n x n real matrices has no proper ideal. 

33* Prove or disprove: If a 2 = a for all a in a ring R, then R has characteristic 2* 

34. An element ^ of a ring S is called idempotent if e 2 = e. Note that in a product R x R r of 
rings, the element e = (1 ? 0) is idempotent. The object of this exercise is to prove a 
converse. 

(a) Prove that if e is idempotent，then ^= 1 - ^ is also idempotent. 

(b) Let e be an idempotent element of a ring S. Prove that the principal ideal eS is a 
ring，with identity element e t It will probably not be a subring of S because it will 
not contain 1 unless ^ = 1. 

(c) Let e be idempotent，and let 〆 =1 — Prove that S is isomorphic to the product 
ring (eS) X (e r S). 

4 . Quotient Rings and Relations in a Ring 

1. Prove that the image of the homomorphism <p of Proposition (4*9) is the subring de¬ 
scribed in the proposition. 

2 . Determine the structure of the ring Z[x]/(x 2 + 3，/?)，where (a) p = 3, (b) p = 5. 

3* Describe each of the following rings. 

⑻ Z[x]/(x 2 一 3,2x + 4) (b) Z[i]/(2 + /) 

4. Prove Proposition (4.2). 

5. Let R f be obtained from a ring R by introducing the relation a = 0, and let if/: R - >R f 

be the canonical map. Prove the following universal property for this construction: Let 

cp: R - >R be a ring homomorphism，and assume that (p{a) = 0 in J?. There is a unique 

homomorphism <p f : R f - >R such that <p f 0 if/ = cp t 

6 . Let /, 7 be ideals in a ring R. Prove that the residue of any element of / H 7 in R/IJ is 
nilpotent* 

7. Let I, J be ideals of a ring R such that / + 7 = J?. 

(a) Prove that IJ = I C\ J. 
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*(b) Prove the Chinese Remainder Theorem: For any pair a, b of elements of /?. there is 
an element x such that x = a (modulo I) and x 三 b (modulo J). [The notation 
x = a (modulo /) means x — a G I t ] 

8. Let I, J be ideals of a ring R such that I + J = R and IJ = 0. 

(a) Prove that R is isomorphic to the product (R/l) x (R/J). 

(b) Describe the idempotents corresponding to this product decomposition (see exercise 
34, Section 3). 

5 . Adjunction of Elements 

1. Describe the ring obtained from Z by adjoining an element a satisfying the two relations 
2a — 6 = 0 and a — 10 = 0. 

2. Suppose we adjoin an element a to [R satisfying the relation a 2 — 1. Prove that the re¬ 
sulting ring is isomorphic to the product ring [R x R，and find the element of [R x [R 
which corresponds to a. 

3. Describe the ring obtained from the product ring IR x IR by inverting the element (2,0 )， 

4. Prove that the elements l，t — a，（t 一 a) 2 ,..., (t — a) n ~ l form a C-basis for 

mm - 

5. Let a denote the residue of x in the ring 7?' = Z[jc]/(x 4 + jc 3 + jc 2 + x+ 1). Compute 
the expressions for (a 3 + a 2 + a){a + 1) and a 5 in terms of the basis (1, a,a 2 , a 3 ? a 4 ). 

6. In each case, describe the ring obtained from F 2 by adjoining an element a satisfying the 
given relation. 

(a) a 2 + a + 1 = 0 (b) a 2 + 1 = 0 

7. Analyze the ring obtained from Z by adjoining an element a which satisfies the pair of 
relations a 3 + a 2 + 1 = 0 and a 2 + a = 0. 

8. Let a E R. If we adjoin an element a with the relation a; = a，we expect to get back a 
ring isomorphic to R. Prove that this is so. 

9. Describe the ring obtained from Z/12 / by adjoining an inverse of 2. 

10. Determine the structure of the ring R r obtained from Z by adjoining element a satisfy¬ 
ing each set of relations. 

(a) 2a = 6, 6a = 15 (b) 2a = 6, 6a = 18 (c) 2a = 6 ? 6a = 8 

11. Let 7? = Z/(10). Determine the structure of the ring obtained by adjoining an element a 
satisfying each relation. 

(a) 2a - 6 = 0 (b) 2a - 5 = 0 

12. Let a be a unit in a ring R. Describe the ring R f = R[x]/(ax — 1). 

13. (a) Prove that the ring obtained by inverting x in the polynomial ring R [x] is isomorphic 

to the ring of Laurent polynomials, as asserted in (5.9). 

(b) Do the formal Laurent series 2 a n x n form a ring? 

-oo 

14. Let a be an element of a ring R, and let 7?' = R[x]/(ax — 1) be the ring obtained by ad¬ 
joining an inverse of a to R. Prove that the kernel of the map R - >R f is the set of ele¬ 

ments b E ： R such that a n b = 0 for some n > (h 

15. Let a be an element of a ring /?， and let R f be the ring obtained from R by adjoining an 
inverse of a. Prove that R f is the zero ring if and only if a is nilpotent. 
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16. Let F be a field. Prove that the rings F[x]/(x 2 ) and F[x]/(x 2 — 1) are isomorphic if and 
only if F has characteristic 2. 

17. Let R = Z[x]/(2x). Prove that every element of R has a unique expression in the form 

ao + d\X + + a n x n , where ^ are integers and a\,^. y a n are either 0 or 1. 

6. Integral Domains and Fraction Fields 

1. Prove that a subring of an integral domain is an integral domain. 

Prove that an integral domain with finitely many elements is a field. 

3. Let R be an integral domain. Prove that the polynomial ring R [x] is an integral domain, 

4. Let R be an integral domain. Prove that the invertible elements of the polynomial ring 
R[x] are the units in R. 

5. Is there an integral domain containing exactly 10 elements? 

6. Prove that the field of fractions of the formal power series ring F [[x]] over a field F is 
obtained by inverting the single element x, and describe the elements of this field as cer¬ 
tain power series with negative exponents. 

7. Carry out the verification that the equivalence classes of fractions from an integral do¬ 
main form a field. 

8. A semigroup 5" is a set with an associative law of composition having an identity ele¬ 
ment. Let 5 be a commutative semigroup which satisfies the cancellation law: ab = ac 
implies b = c. Use fractions to prove that S can be embedded into a group. 

*9. A subset S of an integral domain R which is closed under multiplication and which does 
not contain 0 is called a multiplicative set. Given a multiplicative set 5"，we define S- 
fractions to be elements of the form a/b, where b G S. Show that the equivalence 
classes of S -fractions form a ring. 

Z Maximal Ideals 

1. Prove that the maximal ideals of the ring of integers are the principal ideals generated by 
prime integers. 

2. Determine the maximal ideals of each of the following. 

(a) R X 1R (b) U[x]/(x 2 ) (c) U[x]/(x 2 — 3x + 2) (d) IR[x]/(jc 2 + x + 1) 

3. Prove that the ideal (x + y 2 ,y + jc 2 + 2xy 2 + y 4 ) in C[x ， y] is a maximal ideal. 

4* Let be a ring, and let I be an ideal of R. Let M be an ideal of R containing /， and let 
M = M/I bt the corresponding ideal of R . Prove that M is maximal if and only if M is. 

5. Let I be the principal ideal of C[x,y] generated by the polynomial y 2 + x 3 — 17. Which 
of the following sets generate maximal ideals in the quotient ring R = C[x ， y]//? 

(a) (x ~ l,y - 4) (b) (x + l,y + 4) (c) (x 3 - 17, y 2 ) 

6. Prove that the ring ¥ 5 [x]/(x 2 + x + 1) is a field. 

7. Prove that the ring F 2 M/(^ 3 + -v + 1) is a field，but that F 3 [^]/(x 3 + x + 1) is not a 
field. 

8. Let/? = C[jci ， ， _• ， x„]/7be a quotient of a polynomial ring over C，and letMbe a maxi¬ 
mal ideal of R. Prove that R/M ^ C. 

9. Define a bijective correspondence between maximal ideals of U[x] and points in the up¬ 
per half plane. 
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10. Let /? be a ring，with M an ideal of R. Suppose that every element of R which is not in M 
is a unit of R. Prove that M is a maximal ideal and that moreover it is the only maximal 
ideal of R. 

11. Let P be an ideal of a ring R. Prove that R — R/P is an integral domain if and only if 
P + R, and that if a，b E R and ab E P, then a E ： P or b E P. (An ideal P satisfying 
these conditions is called a prime ideal.) 

12. Let <p: R - >' be a ring homomorphism, and let P' be a prime ideal of R\ 

(a) Prove that cp~ l {P f ) is a prime ideal of R. 

(b) Give an example in which P ; is a maximal ideal，but (p~ l (P f ) is not maximal. 

*13. Let R be an integral domain with fraction field F, and let P be a prime ideal of Let R p 
be the subset of F defined by 

R p 二 {a/d \a,d ^ R,d ^ P}, 

This subset is called the localization of R at P • 

(a) Prove that R p is a subring of F. 

(b) Determine all maximal ideals of R p . 

14. Find an example of a “ring without unit element” and an ideal not contained in a maxi¬ 
mal ideal. 

8. Algebraic Geometry 

1. Determine the points of intersection of the two complex plane curves in each of the 
following. 

(a) y 2 — x 3 + x 2 — 1, x + y — l 

(b) x 2 + xy y 2 = 1, x 2 + 2y 2 = 1 

(c) y 2 — x 3 ，xy — l 

(d) x + y + y 2 — 0, x — y y 2 — 0 

(e) x + y 2 — 0, y + x 2 + 2xy 2 + y 4 = 0 

2. Prove that two quadratic polynomials g in two variables have at most four common ze¬ 
ros, unless they have a nonconstant factor in common. 

3. Derive the Hilbert Nullstellensatz from its classical form (8.7). 

4. Let U, V be varieties in C' Prove that U U V and fl V are varieties. 

5* Let /i ， …， / 厂 ； E C[xi ， … ， jc «]， and let U, V be the zeros of {/i ， …， /]， 
{ 茗 1 ， … ， gs} respectively. Prove that if U and V do not meet, then (/i ，•_.，/，； gi ， … ，心 ） is 
the unit ideal. 

6. Let / = /i … /m and g = gi … g n ，where fi , gj are irreducible polynomials in C[x ? y]. 
Let St = {fi = 0} and 7} = {gj = 0} be the Riemann surfaces defined by these polyno¬ 
mials, and let V be the variety / 二 g = 0. Describe V in terms of Si , 7}. 

1. Prove that the variety defined by a set {/i ， ... ， /r} of polynomials depends only on the 
ideal (/i ， … ， f r ) they generate. 

8. Let be a ring containing C as subring. 

(a) Show how to make R into a vector space over C. 

(b) Assume that is a finite-dimensional vector space over C and that R contains ex¬ 
actly one maximal ideal M. Prove that M is the nilrodical of R, that is, that M con¬ 
sists precisely of its nilpotent elements. 

9. Prove that the complex conic = 1 is homeomorphic to the plane，with one point 
deleted. 
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10 . Prove that every variety in C 2 is the union of finitely many points and algebraic curves. 

11* The three polynomials/i — x 2 y 2 — 1，/2 = jc 2 — y + 1 ， and/3 = xy — l generate 
the unit ideal in C[jc ， j ]， Prove this in two ways: (i) by showing that they have no com¬ 
mon zeros，and (ii) by writing 1 as a linear combination of/i,/ 2 ，/ 3 ，with polynomial 
coefficients. 

12. (a) Determine the points of intersection of the algebraic curve S: y 2 = x 3 — x 2 and the 
line L: = 入 a ：. 

(b) Parametrize the points of 5 as a ftmction of A. 

(c) Relate S to the complex A-plane, using this parametrization. 

*13, The radical of an ideal I is the set of elements r E. R such that some power of r is in /. 

(a) Prove that the radical of I is an ideal. 

(b) Prove that the varieties defined by two sets of polynomials {/! ， ". ， / ， } ， { 々 ，._•，☆} 
are equal if and only if the two ideals (/1 ， … ， / r ) ， （ gi, … ， gs) have the same radicals. 

*14. Let/? = C[xu ， ..,x n ]/(f\ Let A be a ring containing C as subring. Find a bijec- 

tive correspondence between the following sets: 

(i) homomorphisms <p: R - > A which restrict to the identity on C, and 

(ii) /r-tuples a = (a\ 7 ...,a n ) of elements of A which solve the system of equations 
/，= ... = f m — 0, that is, such that fi(a) = 0 for i — 



1. Let F be a field, and let K denote the vector space F 2 . Define multiplication by the rules 
{dua 2 ) ^ {bx.bi) = (a { bi - a 2 b 2 ， a'b 2 + a 2 b x ). 

(a) Prove that this law and vector addition make K into a ring. 

(b) Prove that 足 is a field if and only if there is no element in F whose square is — 1. 

(c) Assume that -1 is a square in F and that F does not have characteristic 2. Prove that 
K is isomorphic to the product ring F X F. 

2. (a) We can define the derivative of an arbitrary polynomial f(x) with coefficients in a 

ring R by the calculus formula (a n x n + + a^x + a 0 ) r = na n x n ~ l + … + 1 山 . 

The integer coefficients are interpreted in R using the homomorphism (3.9). Prove 
the product formula (fg) f = f f g + fg r and the chain rule (/ 。 g)' 二 （ /' 。 g)g f . 

(b) Let/(x) be a polynomial with coefficients in a field F, and let a be an element of F. 
Prove that a is a multiple root of/if and only if it is a common root of / and of its 
derivative /’. 

(c) Let F = F 5 . Determine whether or not the following polynomials have multiple roots 
in F: x 15 — x, x 15 — 2x 5 + L 

3. Let /? be a set with two laws of composition satisfying all the ring axioms except the 
commutative law for addition. Prove that this law holds by expanding the product 
{a + b)(c + d) in two ways using the distributive law. 

4. Let /? be a ring* Determine the units in the polynomial ring R[x] t 

5. Let R denote the set of sequences a = {a x ,a 2 ,az^..) of real numbers which are eventu¬ 
ally constant: a n = a n +\ = … for sufficiently large n. Addition and multiplication are 
component-wise; that is，addition is vector addition and ab = {a\b\,a 2 b 2 ^^) - 

(a) Prove that /? is a ring. 

(b) Determine the maximal ideals of /?. 

6. (a) Classify rings R which contain C and have dimension 2 as vector space over C. 

*(b) Do the same as (a) for dimension 3. 
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*7. Consider the map <p: C[x 7 y] - > C[x] X C[y] X C[t] defined by /Oc ， ;y) A/vvv ^ 

(/U ，0 ) ， /(0, y)，Determine the image of cp explicitly. 

8 . Let *S be a subring of a ring R. The conductor C of 5 in /? is the set of elements a E. R 
such that aR C S, 

(a) Prove that C is an ideal of R and also an ideal of S. 

(b) Prove that C is the largest ideal of S which is also an ideal of R. 

(c) Determine the conductor in each of the following three cases: 

(i) R = C[t], S = C[f 2 ， f 3 ]; _ _ 

(ii) R - mi (=4(-1+ V^5)，5 = 1[\^3]; 

(iii) R = C[f ? 厂 】]，S = 

9. A line in C 2 is the locus of a linear equation L: {ax + by + c = 0} t Prove that there is a 

unique line through two points (x 0 ,yo), and also that there is a unique line 

through a point (x 0 , yo) with a given tangent direction (w 0 ， t; 0 ) _ 

10. An algebraic curve C in C 2 is called irreducible if it is the locus of zeros of an irreducible 
polynomial/^,}?) — one which can not be factored as a product of nonconstant polyno¬ 
mials. A point p E C is called a singular point of the curve if df/dx = df/dy — 0 at p. 
Otherwise p is a nonsingular point. Prove that an irreducible curve has only finitely 
many singular points. 

11 * Let L: ax + by + c = 0 be a line and C: {/ = 0} a curve in C 2 . Assume that b 幸 0. 
Then we can use the equation of the line to eliminate y from the equation/(jc ? j) = 0 of 
C, obtaining a polynomial g(x) in jc. Show that its roots are the ^-coordinates of the in¬ 
tersection points. 

12. With the notation as in the preceding problem，the multiplicity of intersection of L and C 
at a point p = (jc 0 , yo) is the multiplicity of x 0 as a root of g(x). The line is called a tan¬ 
gent line to C at p if the multiplicity of intersection is at least 2. Show that if p is a non¬ 
singular point of C 5 then there is a unique tangent line at (jc 0 ， ;yo)，and compute it. 

13. Show that if is a singular point of a curve C，then the multiplicity of intersection of ev¬ 
ery line through p is at least 2. 

14. The degree of an irreducible curve C: {/ = 0} is defined to be the degree of the irre¬ 
ducible polynomial /. 

(a) Prove that a line L meets C in at most d points, unless C = L. 

*(b) Prove that there exist lines which meet C in precisely d points • 

15. Determine the singular points of x 3 + y 3 - 3xy = 0, 

*16. Prove that an irreducible cubic curve can have at most one singular point. 

*17* A nonsingular point p of a curve C is called a flex point if the tangent line L to C at p has 
an intersection of multiplicity at least 3 with C at p. 

(a) Prove that the flex points are the nonsingular points of C at which the Hessian 



vanishes. 

(b) Determine the flex points of the cubic curves y 2 - x 3 and 夕 2 — jc 3 + x 2 . 
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*18. Let C be an irreducible cubic curve，and let L be a line joining two flex points of C. 
Prove that if L meets C in a third point, then that point is also a flex, 

19. Let U = {/( jci ， …， = 0}, V = {gj(y\,..^y n ) — 0} be two varieties. Show that the 
variety defined by the equations {fi(x) = 0, gj(y) = 0} in is the product set U XV. 

20. Prove that the locus y = sin jc in U 2 doesn’t lie on any algebraic curve. 

*21. Let f y g be polynomials in C[x 7 y] with no common factor. Prove that the ring R = 
C[x y y]/(f, g) is a finite-dimensional vector space over C, 

22* (a) Let 5, c denote the functions sin x, cos jc on the real line. Prove that the ring R[s, c] 
they generate is an integral domain. 

(b) Let K — 1R(5 ,c) denote the field of fractions of U[s, c] t Prove that the field K is iso¬ 
morphic to the field of rational functions R(x). 

*23. Let f{x), g(x) be polynomials with coefficients in a ring R with / ^ 0. Prove that if the 
product f(x)g(x) is zero, then there is a nonzero element c E R such that cg(x) = 0. 

*24, Let X denote the closed unit interval [0,1]，and let R be the ring of continuous functions 
X — >U. 

(a) Prove that a function / which does not vanish at any point of X is invertible in R. 

(b) Let /i ， …， be functions with no common zero on X. Prove that the ideal generated 
by these functions is the unit ideal. (Hint; Consider/! 2 + … + fn 2 -) 

(c) Establish a bijective correspondence between maximal ideals of R and points on the 
interval, 

(d) Prove that the maximal ideals containing a function / correspond to points of the in¬ 
terval at which/ = 0. 

(e) Generalize these results to functions on an arbitrary compact set X in 1R*. 

(f) Describe the situation in the case X ^ K. 
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1. FACTORIZATION OF INTEGERS AIW POLYNOMIALS 

This chapter is a study of division in rings. Because it is modeled on properties of 
the ring of integers, we will begin by reviewing these properties. Some have been 
used without comment in earlier chapters of the book, and some have already been 
proved* 

The property from which all others follow is division with remainder; If a, b 
are integers and a ^ 0, there exist integers q,r so that 

(LI) b = aq + r, 

and 0 ^ r < I a |. This property is often stated only for positive integers, but we al¬ 
low a and b to take on negative values too. That is why we use the absolute value | a 
to bound the remainder. The proof of the existence of (LI) is a simple induction ar¬ 
gument. 

We’ve already seen some of the most important consequences of division with 
remainder，but let us recall them. In Chapter 10, we saw that every subgroup of Z+ 
is an ideal and that every ideal of / is principal, that is, it has the form dZ for some 
integer J > 0. As was proved in Chapter 2 (2.6), this implies that a greatest com¬ 
mon divisor of a pair of integers a, b exists and that it is an integer linear combina¬ 
tion of a and b. If a and b have no factor in common other than ±1, then 1 is a lin¬ 
ear combination of a and b with integer coefficients: 

(1.2) ra + sb = l, 

for some r,s E This implies the fundamental property of prime integers, which 
was proved in Chapter 3 (2.8). We restate it here: 
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(1.3) Proposition* Let be a prime integer, and let a, b be integers. If p divides 
the product ab ， then p divides a or b. n 

(1.4) Theorem. Fundamental Theorem of Arithmetic: Every integer a ^ 0 can be 
written as a product 

a = cpr“ pk ， 

where c = ±1 ， the pi are positive prime integers, and k > 0. This expression is 
unique except for the ordering of the prime fectors. 

Proof• First, a prime factorization exists. To prove this, it is enough to con¬ 
sider the case that a is greater than 1. By induction on a, we may assume the exis¬ 
tence proved for all positive integers b < a. Either a is prime, in which case the 
product has one factor, or there is a proper divisor b 牛 cl Then a — bb f and also 
b f ^ a. Both b and b f are smaller than a, and by induction they can be factored 
into primes. Setting their factorizations side by side gives a factorization of a. 
Second, the factorization is unique. Suppose that 

土 Pi … Pn = a = ±qi "• q m . 

The signs certainly agree. We apply (L3), with p = p x . Since p\ divides the product 
q m , it divides some qi, say q\. Since q\ is prime ， p\ = q x . Cancel p\ and pro¬ 
ceed by induction* □ 

The structure of the ring of integers is closely analogous to that of a polyno¬ 
mial ring F[x] in one variable over a field. Whenever a property of one of these 
rings is derived ， we should try to find an analogous property of the other. We have 
already discussed division with remainder for polynomials in Chapter 10， and we 
have seen that every ideal of the polynomial ring F[x] is principal [Chapter 10 
( 3 . 21 )]. 

A polynomial p (x) with coefficients in a field F is called irreducible if it is not 
constant and if its only divisors of lower degree in F[x] are constants. This means 
that the only way that p can be written as a product of two polynomials is p = cp \, 
where c is a constant and p\ is a constant multiple of p. The irreducible polynomials 
are analogous to prime integers • It is customary to normalize them by factoring out 
their leading coefficients，so that they become monic. 

The proof of the following theorem is similar to the proof of the analogous 
statements for the ring of integers: 

(1.5) Theorem* Let F be a field，and let F[x] denote the polynomial ring in one 
variable over F. 

(a) If two polynomials f, g have no common nonconstant factor, then there are 
polynomials E F[x] such that rf + sg = 1. 

(b) If an irreducible polynomial p G F[x] divides a product/g, then p divides one 
of the fectors/or g. 
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(c) Every nonzero polynomial/ E F[x] can be written as a product 

/ = cpi … p k , 

where c is a nonzero constant, the pt are monic irreducible polynomials in 

F[jc] ? and k > 0. This factorization is unique, except for the ordering of the 

terms. □ 

The constant factor c which appears in the third part of this theorem is 
analogous to the factor ±1 in (1.4). These are the units in their respective rings• The 
unit factors are there because we normalized primes to be positive, and irreducible 
polynomials to be monic. We can allow negative primes or nonmonic irreducible 
polynomials if we wish. The unit factor can then be absorbed, if A: > 0, But this 
complicates the statement of uniqueness slightly. 

(1.6) Examples. Over the complex numbers, every polynomial of positive degree 
has a root a and therefore has a divisor of the form x — a. So the irreducible poly¬ 
nomials are linear, and the irreducible factorization of a polynomial has the form 

(1.7) /(jc) = — ai ) … (jc — a „)， 

where at are the roots of f(x ) ? repeated as necessary. The uniqueness of this fector- 
ization is not surprising. 

When F = U, there are two classes of irreducible polynomials: linear polyno¬ 
mials and irreducible quadratic polynomials. A real quadratic polynomial 
x 2 + + c is irreducible if and only if its discriminant b 2 — 4c is negative, in 

which case it has a pair of complex conjugate roots. The fact that every irreducible 
polynomial over the complex numbers is linear implies that no higher-degree poly¬ 
nomial is irreducible over the reals. Suppose that a polynomial f(x) has real 
coefficients at and that a is a complex, nonreal root of f{x). Then the complex con¬ 
jugate a is different from a and is also a root* For, since /is a real polynomial，its 
coefficients a t satisfy the relation at — at. Then 

f(a) = a n a n + + a{a + a 0 = a n a n + + a x a + a 0 = f(a) = 0 = 0. 

The quadratic polynomial g(x) = (x — a)(x — a) = x 2 — (a + a)x + aa has 
real coefficients —(a + a) and da, and both of its linear factors appear on the right 
side of the complex factorization (1.7) off(x). Thus g(x) dividesSo the factor¬ 
ization of f(x) into irreducible real polynomials is obtained by grouping conjugate 
pairs in the complex factorization. □ 

Factorization of polynomials is more complicated for polynomials with rational 
coefficients than for real or complex polynomials，because there exist irreducible 
polynomials in Q[x] of arbitrary degree. For example, x 5 - 3x 4 + 3 is irreducible 
in Q[jc]- We will see more examples in Section 4. Neither the form of the irreducible 
factorization nor its uniqueness is intuitively clear for rational polynomials. 

For future reference, we note the following elementary fact: 
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(1_8) Proposition* Let F be a field，and let/(x) be a polynomial of degree n with 
coefficients in F. Then/has at most n roots in F. 

Proof. An element a G F is a root of/if and only ifx — a divides/[Chapter 
10 (3.20)], If so, then we can write/(x) = (x — a)q (jc) ? where q (x) is a polynomial 
of degree n — L If /3 is another root of/, then/(/3) 二 （/3 — a)q{fi) = 0. Since F 
is a field，the product of nonzero elements of F is not zero. So one of the two ele¬ 
ments p — a, q(l5) is zero. In the first case P — a, and in the second case /3 is one 
of the roots of q(x). By induction on n, we may assume that has at most n — l 
roots in F, Then there are at most n possibilities for □ 

The fact that F is a field is crucial to Theorem (L5) and to Proposition (1,8 )， 
as the following example shows. Let R be the ring Z/8Z. Then in the polynomial 
ring R[x], we have 

x 2 — l ~ (x - 1)( 义 —1) = (x + 3)(x — 3). 

The polynomial x 2 — 1 has four roots modulo 8， and its factorization into irre¬ 
ducible polynomials is not unique. 


Z VMQVE FACTORIZATIONDOMUNS, PRINCIPAL WEAL DOMAINS, 

AND EUCLIDEAN DOMAINS 

Having seen that factorization of polynomials is analogous to factorization of in¬ 
tegers, it is natural to ask whether other rings can have such properties. Relatively 
few such rings exist, but the ring of Gauss integers is one interesting example. This 
section explores ways in which various parts of the theory can be extended 

We begin by introducing the terminology used in studying fectorization. It is 
natural to assume that the given ring R is an integral domain, so that the Cancella¬ 
tion Law is available，and we will make this assumption throughout. We say that an 
element a divides another element b (abbreviated a\b) if b = aq for some q E R. 
The element a is a proper divisor ofb iib = aq for some q E R and if neither a nor 
^ is a unit. A nonzero element a of /? is called irreducible if it is not a unit and if it 
has no proper divisor. Two elements a, a’ are called associates if each divides the 
other. It is easily seen that a,a f are associates if and only if they differ by a unit fac¬ 
tor, that is, if a f = ua for some unit u. 

The concepts of divisor ， unit，and associate can be interpreted in terms of the 
principal ideals generated by the elements. Remember that an ideal I is called princi¬ 
pal if it is generated by a single element: 

(2.1) I = (a). 

Keep in mind the fact that (a) consists of all elements which are multiples of a, that 
is, which are divisible by a. Then 
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(2.2) m is a unit O (u) = (1) 

a and a f are associates O (a) — (a f ) 

a divides b (a) D (b) 

a is a proper divisor of b (1) > w > (b). 

The proof of these equivalences is straightforward, and we omit it. 

Now suppose that we hope for a theorem analogous to the Fundamental Theo¬ 
rem of Arithmetic in an integral domain /?. We may divide the statement of the the¬ 
orem into two parts. First, a given element a must be a product of irreducible ele¬ 
ments ,and second, this product must be essentially unique. 

Consider the first part. We assume that our element a is not zero and not a unit; 
otherwise we have no hope of writing it as a product of irreducible elements. Then 
we attempt to factor a, proceeding as follows: If a is irreducible itself, we are done. 
If not, then a has a proper factor, so it decomposes in some way as a product ， 
a = a x b\, where neither a x nor b\ is a unit. We continue factoring a\ and b\ if possi¬ 
ble, and we hope that this procedure terminates; in other words, we hope that after a 
finite number of steps all the factors are irreducible. The condition that this proce¬ 
dure always terminates has a neat description in terms of principal ideals: 

(2.3) Proposition. Let R be an integral domain. The following conditions are 
equivalent: 

(a) For every nonzero element a of R which is not a unit, the process of factoring 
a terminates after finitely many steps and results in a factorization a = bi … bk 
of a into irreducible elements of R. 

(b) R does not contain an infinite increasing chain of principal ideals 

(fli) < (a 2 ) < (a 3 ) < … • 

Proof • Suppose that R contains an infinite increasing sequence 
(fli) < ( 叱 ) < Then (a n ) < (1) for every n, because (a n ) < C (1). Since 

(a n -\) (a 行 )， is a proper divisor of — i，say a 行一 \ 戊 nbfi where ctn ^are not 
units. This provides a nonterminating sequence of factorizations of a \： a\ = a 2 b 2 = 
a^b^bi = ckb 4 b 3 b 2 ... • Conversely，such a sequence of factorizations gives us an in¬ 
creasing chain of ideals. □ 

The second condition of this proposition is often called the ascending chain 
condition for principal ideals* However, to emphasize the factorization property, we 
will say that existence of factorizations holds in R if the equivalent conditions of the 
proposition are true. 

It is easy to describe domains in which existence of factorizations fails. One ex¬ 
ample is obtained by adjoining all 2^-th roots of X\ to the polynomial ring F[x{]: 

(2.4) R = F[x\,x 2j X3^^]j 
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with the relations xi = x\, X 3 2 — xi, X 4 2 — x 3 , and so on. We can factor the element 
X\ indefinitely in this ring, and correspondingly there is an infinite chain 
Ui) < ( 又 2 ) < •" of principal ideals. 

It turns out that we need infinitely many generators for a ring to make an ex¬ 
ample such as the one just given, so we will rarely encounter such rings. In practice, 
the second part of the Fundamental Theorem is the one which gives the most trou¬ 
ble. Factorization into irreducible elements will usually be possible, but it will not be 
unique. 

Units in a ring complicate the statement of uniqueness. It is clear that unit fac¬ 
tors should be disregarded, since there is no end to the possibility of adding unit fac¬ 
tors in pairs uu^K For the same reason, associate factors should be considered equiv¬ 
alent. The units in the ring of integers are ±1， and in this ring it was natural to 
normalize irreducible elements (primes) to be positive; similarly, we may normalize 
irreducible polynomials by insisting that they be monic. We don’t have a reasonable 
way to normalize elements of an arbitrary integral domain, so we will allow some 
ambiguity. It is actually neater to work with principal ideals than with elements: As¬ 
sociates generate the same principal ideal. However，it isn’t too cumbersome to use 
elements here, and we will stay with them. The importance of ideals will become 
clear in the later sections of this chapter. 

We will call an integral domain R a unique factorization domain if it has the 
following properties: 

(2.5) 

(i) Existence of factorizations is true for R. In other words, the process of factor¬ 
ing a nonzero element a which is not a unit terminates after finitely many steps 
and yields a factorization a — pr" Pm ， where each p\ is irreducible* 

(ii) The irreducible factorization of an element is unique in the following sense: If 
a is factored in two ways into irreducible elements, say a — p\ … p m = 
q' … q n ，then m — n, and with suitable ordering of the factors ? pt is an associ¬ 
ate of qi for each 厂 

So in the statement of uniqueness，associate factorizations are considered equivalent. 

Here is an example in which uniqueness of factorization is not true. The ring is 
the integral domain 

(2.6) R = 

It consists of all complex numbers of the form a + Z? V^5, where a,b E Z. The 
units in this ring are ± 1 ， and the integer 6 has two essentially different factoriza¬ 
tions in R: 

(2.7) 6 = 2*3 = (1+V r 5)(l-V r 5). 

It is not hard to show that all four terms 2, 3,1+v-5,1-V^5 are irreducible ele- 
ments of R. Since the units are ±1， the associates of 2 are 2 and -2. So 2 is not an 
associate of 1± V-5, which shows that the two factorizations are essentially differ¬ 
ent and hence that R is not a unique factorization domain. 
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The crucial property of prime integers is that if a prime divides a product, it 
divides one of the factors. We will call an element p of an integral domain R prime if 
it has these properties: p is not zero and not a unit, and if p divides a product of ele¬ 
ments of R, it divides one of the factors. These are the properties from which 
uniqueness of the factorization is derived. 

(2.8) Proposition. Let R be an integral domain. Suppose that existence of factor¬ 
izations holds in R. Then /? is a unique factorization domain if and only if every irre¬ 
ducible element is prime. 

The proof is a simple extension of the arguments used in (1,3) and (1.4); we leave it 
as an exercise. □ 

It is important to distinguish between the two concepts of irreducible 
element and prime element. They are equivalent in unique factorization domains, 
but most rings contain irreducible elements which are not prime. For instance, in 
the ring R = Z[V-5] considered above, the element 2 has no proper factor, 
so it is irreducible. It is not prime because, though it divides the product 6 = 
(1+V-5)(1—it does not divide either factor. 

Since irreducible elements in a unique factorization domain are prime, the 
phrases irreducible factorization and prime factorization are synonymous. We can 
use them interchangeably when we are working in a unique factorization domain, 
but not otherwise. 

There is a simple way of deciding whether an element a divides another ele¬ 
ment b in sl unique factorization domain, in terms of their irreducible (or prime) fac¬ 
torizations. 

(2.9) Proposition. Let R be a unique factorization domain，and let a = p\ … p r ， 
b = q s be given prime factorizations of two elements of R. Then a divides b in 
R if and only if s > r, and with a suitable ordering of the factors qt of b, pi is an 
associate of qi for i = l ， ". ， r. □ 

(2.10) Corollary • Let R be a unique factorization domain, and let a, b be elements 
of R which are not both zero. There exists a greatest common divisor d of a, b, with 
the following properties: 

(i) d divides a and b\ 

(ii) if an element e of R divides a and b, then e divides d. □ 

It follows immediately from the second condition that any two greatest common di¬ 
visors of a, b are associates. However ， the greatest common divisor need not have 
the form ra + sb. For example, we will show in the next section that the integer 
polynomial ring Z[x] is a unique factorization domain [see (3.8)]. In this ring, the 
elements 2 and x have greatest common divisor 1 ， but 1 is not a linear combination 
of these elements with integer polynomial coefficients. 
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Another important property of the ring of integers is that every ideal of / is 
principal. An integral domain in which every ideal is principal is called a principal 
ideal domain • 

(2.11) Proposition* 

(a) In an integral domain, a prime element is irreducible. 

(b) In a principal ideal domain，an irreducible element is prime* 

We leave the proofs of (2.9-2.11) as exercises •口 

(2.12) Theorem* A principal ideal domain is a unique fectorization domain. 

Proof. Suppose that R is a principal ideal domain. Then every irreducible ele¬ 
ment of R is prime. So according to Proposition (2.8)，we need only prove the exis¬ 
tence of factorizations for R. By Proposition (2.3), this is equivalent to showing that 
R contains no infinite increasing chain of principal ideals. We argue by contradic¬ 
tion .Suppose that (fli) < < (fl 3 ) < is such a chain, 

(2J3) Lemma. Let R be any ring. The union of an increasing chain of ideals 
/i C / 2 C / 3 C .., is an ideal. 

Proof. Let / denote the union of the chain. If u, v are in I, then they are in l n 
for some n. Then u + v and ru are also in l n \ hence they are in L □ 

We apply this lemma to the union I of our chain of principal ideals and use the hy¬ 
pothesis that R is a principal ideal domain to conclude that / is principal, say 
I = (b). Now since b is in the union of the ideals (a n ), it is in one of these ideals. 
But if b G (a n ), then (b) C (a n ), and on the other hand (a n ) C (a n ^\) C (b). There¬ 
fore (a n ) = (a n ^\) = (b). This contradicts the assumption that (a n ) < ( 办 +i)，and 
this contradiction completes the proof. □ 

The converse of Theorem (2.12) is not true. The ring Z[x] of integer polyno¬ 
mials is a unique factorization domain [see (3.8)], but it is not a principal ideal do¬ 
main. 

(2.14) Proposition. 

(a) Let /? be a nonzero element of a principal ideal domain R. Then R/(p) is a field 
if and only if p is irreducible. 

(b) The maximal ideals are the principal ideals generated by irreducible elements. 

Proof. Since an ideal M is maximal if and only if R/M is a field, the two parts 
are equivalent. We will prove the second part. A principal ideal (a) contains another 
principal ideal (b) if and only if a divides b. The only divisors of an irreducible ele¬ 
ment p are the units and the associates of p. Therefore the only principal ideals 
which contain (p) are (p) and (1). Since every ideal of R is principal, this shows that 
an irreducible element generates a maximal ideal. Conversely, let b be a polynomial 
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having a proper factorization b = aq, where neither a nor ^ is a unit. Then 
⑹ < ⑷ (1)，and this shows that (Z?) is not maximal. □ 


Let us now abstract the procedure of division with remainder. To do so, we 
need a notion of size of an element of a ring. Appropriate measures are 

(2.15) absolute value, if R = 

degree of a polynomial ， ifR = F[x], 

{absolute value) 2 , if R = /[/]• 

In general, a size function on an integral domain R will be any function 

(2.16) a: i?-{0} — >{0,1,2,...} 

from the set of nonzero elements of R to the nonnegative integers, An integral do¬ 
main /? is a Euclidean domain if there is a size function cr on R such that the division 
algorithm holds: 

(2 ‘ 17) Let a,b E R and suppose that a ^ 0. There are elements q，r 6 R 

such that b = aq + r, and either r = 0 or cr(r) o ■⑹ _ 

We do not require the elements q, r to be uniquely determined by a and b. 

(2.18) Proposition. The rings Z ， F[x], and / [/] are Euclidean domains• □ 


The ring of integers and the polynomial ring have already been discussed Let us 
show that the ring of Gauss integers is a Euclidean domain，with size function the 
function a = \ | 2 . The elements of Z[i] form a square lattice in the complex plane, 

and the multiples of a given element a form a similar lattice ， the ideal (a) = Ra. If 
we write a — re w , then (a) is obtained by rotating through the angle 6 followed by 
stretching by the fector r = |a|: 
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(2.19) Figure. * = ideal (a), R = Z[i] 
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It is clear that for every complex number b, there is at least one point of the lattice 
(a) whose square distance from b is ^^\a\ 2 . Let that point be aq, and set 
r = b — aq. Then |r| 2 < ^|a| 2 < |a| 2 , as required. Note that since there may be 
more than one choice for the element aq, this division with remainder is not unique. 

We could also proceed algebraically. We divide the complex number b by a: 
b = aw, where w = x + yi is a complex number，not necessarily a Gauss integer. 
Then we choose the nearest Gauss integer point (m,n) to (x,y), writings x = 
m x 0 , y = n + y 0 , where m, n are integers and x 0 , jo are real numbers such that 
- < x 0f yo < Then (m + ni)a is the required point of Ra. For, |x 0 + yoi \ 2 < i 
and \ b — (m + ni)a\ 2 = \a(x 0 + yoi) \ 2 < i\ a \ 2 - 

One can copy the discussion of factorization of integers with minor changes to 
prove this proposition: 

(2.20) Proposition, A Euclidean domain is a principal ideal domain，and hence it 
is a unique factorization domain. □ 

(2.21) Corollary. The rings Z ， Z[z], and F[x] (F a field) are principal ideal do¬ 
mains and unique factorization domains, □ 

In the ring Z[i] of Gauss integers, the element 3 is irreducible，hence prime ， 
but 2 and 5 are not irreducible because 

(2.22) 2 = (1 + /)(1 - i) and 5 = (2 + 0((2 - i). 

These are the prime factorizations of 2 and 5 in Z[f\. 

There are four units in the ring Z[/]，namely {±1，±/}. So every nonzero ele¬ 
ment a of this ring has four associates, namely the elements 土 a ， ±ia. The associ¬ 
ates of 2 + /, for example are 

2 + i y _2 — i ， _1 + 2i ’， 1 一 2/. 

There is no really natural way to normalize primes in Z[i] y though if pressed we 
would choose the unique associate lying in the first quadrant and not on the imagi¬ 
nary axis. It is better to accept the ambiguity of (2.5) here or else work with princi¬ 
pal ideals. 

3. GAUSS 9 S LEMMA 

Theorem (1.5) applies to the ring Q[x] of polynomials with rational coefficients; 
Every polynomial f(x) E Q[x] can be expressed uniquely in the form cpr" pk ， 
where c E Q and pt are monic polynomials which are irreducible over Q . Now sup¬ 
pose that a polynomial/( jc) has integer coefficients s /(x) E I[x], and that it fectors 
in Q[x\. Can it be factored without leaving Z[x]? We are going to prove that it can ， 
and that Z[x] is a unique factorization domain. 

Here is an example of a prime fectorization in Z[x]: 

6x 3 + 9x 2 + 9x + 3 = 3(2x + l)(x 2 + x + 1). 
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As we see from this example, irreducible factorizations are slightly more compli¬ 
cated in Z[x] than in Q[x). First, the prime integers are irreducible elements of 
Z[x], so they may appear in the prime factorization of a polynomial. Second, the 
factor 2x + 1 isn’t monic. If we want to stay with integer coefficients，we can’t ask 
for monic factors. 

The integer factors of a polynomial/(x) = a n x n + + a 0 in Z[x] are com¬ 
mon divisors of its coefficients A polynomial/(x) is called primitive if its 

coefficients a 0 ， •，•，have no common integer factor except for the units ± 1 and if 
its highest coefficient a n is positive. 

(3.1) Lemma. Every nonzero polynomial f(x) E Q[x] can be written as a 
product 

f(x) = cfoix), 

where c is a rational number and/oW is a primitive polynomial in Z[x]. Moreover, 
this expression for/is unique. The polynomial / has integer coefficients if and only if 
c is an integer. If so, then ] c| is the greatest common divisor of the coefficients of/, 
and the sign of c is the sign of the leading coefficient of/* 

The rational number c which appears in this lemma is called the content of 
f(x). If / has integer coefficients, then the content divides fin Z[x], Also,/is primi¬ 
tive if and only if its content is 1. 

Proof of the Lemma. To find/ 0 , we first multiply /by an integer to clear the 
denominators in its coefficients. This will give us a polynomial f\ with integer 
coefficients. Then we factor out the greatest common divisor of the coefficients of/i 
and adjust the sign of the leading coefficient. The resulting polynomial / 0 is primi¬ 
tive, and/ = cfo for some rational number c. This proves existence. 

To prove uniqueness，suppose that cfo(x) = dgo(x) ， where c,d E. Q and/o , go 
are primitive polynomials. We will show that c = d and/o = go. Clearing denomi¬ 
nators reduces us to the case that c and d are integers. Let {a/}, {bi} denote the 
coefficients of / 0 , go respectively. Then cat = dbi for all /, Since the greatest com¬ 
mon divisor of {a 0 , … 為 } is 1， c is the greatest common divisor of {⑽，…，⑽}* 
Similarly, d is the greatest common divisor of {dbo, …， db n } = {ca 0? , ca n }. Hence 
c = and/ 0 = ±^ 0 _ Since/ 0 and go have positive leading coefficients , fo = go 
and c = d. If/has integer coefficients ? clearing of the denominator is not necessary; 
hence c is an integer，and up to sign it is the greatest common divisor of the 
coefficients, as stated, □ 

As we have already observed, the Substitution Principle gives us a homomor¬ 
phism 

(3.2) Z[a:] — ^ 

where = Z/pZis the field with p elements. This homomorphism sends a polyno¬ 
mial/^) = a m x m + + ao to its residue/(x) = a m x m + *•* + flo modulo p. We 

will now use it to prove Gauss’s Lemma. 
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(3.3) Theorem* Gauss’s Lemma: A product of primitive polynomials in Z[x] is 
primitive. 

Proof. Let the polynomials be/and g, and let h = fg be their product. Since 
the leading coefficients of/and g are positive, the leading coefficient of h is, too. To 
show that h is primitive，we will show that no prime integer p divides all the 
coefficients of h (x). This will show that the content of // is 1. Consider the homo¬ 
morphism Z[x] - > F^[x] defined above. We have to show that h ^ 0. Since / is 

primitive, its coefficients are not all divisible by p. So f 0. Similarly, g ^ 0. 
Since the polynomial ring ¥ P [x] is an integral domain, h = fg ^ 0, as required. □ 

(3.4) Proposition. 

(a) Let /， 豸 be polynomials in Q[x], and let/ 0 , go be the associated primitive poly¬ 
nomials in Z[x]. If / divides g in Q[x], then/。divides in Z[x]. 

(b) Let / be a primitive polynomial in Z[x] f and let g be any polynomial with 
integer coefficients, Suppose that / divides g in Q[x], say g — fq, with 
q E Q[x]. Then q E Z[x], and hence / divides g in Z[x], 

(c) Let/，g be polynomials in Z[jc], If they have a common nonconstant factor in 
Q[x ] 9 then they have a common nonconstant factor in Z[x] too. 

Proof. To prove (a), we may clear denominators so that /and g become primi¬ 
tive. Then (a) is a consequence of (b). To prove (b)，we apply (3.1) in order to write 
the quotient in the form q = cq 0 , where qo is primitive and c E Q. By Gauss’s 
Lemma ， fq 0 is primitive，and the equation g = cfqo shows that it is the primitive 
polynomial 乡 o associated to g. Therefore g - cg 0 is the expression for g referred to 
in Lemma (3,1), and c is the content of g. Since g E Z[x], it follows that c E /， 
hence that q E / [x]. Finally，to prove (c)，suppose that/, g have a common factor h 
in Q[x]. We may assume that h is primitive, and then by (b) h divides both / and g 
in Z[x]. □ 

(3.5) Corollary. If a nonconstant polynomial /is irreducible in Z[x], then it is ir- 
reducible in Q[x]. □ 

(3.6) Proposition, Let / be an integer polynomial with positive leading coef¬ 
ficient. Then / is irreducible in Z[x] if and only if either 

(i) / is a prime integer，or 

(ii) / is a primitive polynomial which is irreducible in Q[x]. 

Proof, Suppose that / is irreducible. As in Lemma (3*1)，we may write 
/ = c/ 0 , where /o is primitive. Since/is irreducible, this can not be a proper factor¬ 
ization. So either c or/ 0 is 1. If/。= 1， then /is constant，and to be irreducible, a 
constant polynomial must be a prime integer. If c = 1 ， then / is primitive, and is ir¬ 
reducible in Q[x] by the previous corollary. The converse，that integer primes and 
primitive irreducible polynomials are irreducible elements of Z[jc] ，is clear. □ 
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(3.7) Proposition. Every irreducible element of Z[x] is a prime element. 

Proof • Let /be irreducible，and suppose / divides gh ，where g,h E Z[x]. 

Case 1: f = 尸 is a prime integer. Write g = ego and h = dho as in (3.1). Then goho 
is primitive, and hence some coefficient a of goho is not divisible by p. But since p 
divides gh ，the corresponding coefficient, which is eda, is divisible by p. Hence p 
divides c or d, so p divides g or h. 

Case 2: / is a primitive polynomial which is irreducible in 0[x]. By (2.11b)，/is a 
prime element of Q[x] t Hence / divides g or h in Q[jc]. By (3.4)，/divides g or h in 
Z[x]. □ 

(3.8) Theorem. The polynomial ring Z[x] is a unique factorization domain* Every 
nonzero polynomial/( jc) E Z[x] which is not ±1 can be written as a product 

f(x) = ±p '… pm q\(x)-^q n {x), 

where the pi are prime integers and the qt{x) are irreducible primitive polynomials. 
This expression is unique up to arrangement of the factors. 

Existence of factorizations is easy to prove for Z[x], so this theorem follows from 
Propositions (3.7) and (2.8). □ 

Now let R be any unique factorization domain, and let F be its field of fractions 
[Chapter 10 (6,5)]_ Then R[x] is a subring of F[x], and the results of this section 
can be copied 5 replacing Z by R and O by F throughout. The only change to be 
made is that instead of normalizing primitive polynomials it is better to allow ambi¬ 
guity caused by unit factors, as in the previous section. The main results are these: 

(3.9) Theorem* Let /? be a unique factorization domain with field of fractions F. 

(a) Let / 5 g be polynomials in F[x\ and let/ 0 , go be the associated primitive poly¬ 
nomials in R[x]. If / divides g in F[x], then/ 0 divides 尽 0 in R[x]^ 

(b) Let/be a primitive polynomial in and let g be any polynomial in R[x]. 
Suppose that/divides g in F[x], say g = fq, with q E F[x], Then q E R[x], 
and hence/divides g in R[x] t 

(c) Let/, g be polynomials in R[x]. If they have a common nonconstant factor in 
F[x], then they have a common nonconstant factor in R[x] too. 

(d) If a nonconstant polynomial / is irreducible in R[x], then it is irreducible in 

fw. 

(e) R[x] is a unique factorization domain. 

The proof of Theorem (3.9) follows the pattern established for the ring I[x], and we 
omit it _ □ 

Since^ /?[ 々 ， … ，知一 ！ ][ 知 ]， we obtain this corollary: 

(3.10) Corollary • The polynomial rings Z[xi ， … ， x„] and F[x {9 .^ 9 x n ], where F is 
a field, are unique factorization domains •口 
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So the ring C[x y y] of complex polynomials in two variables is a unique factor¬ 
ization domain. In contrast to the case of one variable, however, where every com¬ 
plex polynomial is a product of linear ones，complex polynomials in two variables 
are often irreducible，and hence prime. 

The irreducibility of a polynomial/( jc,j) can sometimes be proved by studying 
the locus W = {f(x 7 y) = 0} in C 2 . Suppose that/factors，say 

f(x,y) = g{x,y)h(x,y), 

where g, h are nonconstant polynomials. Then f(x,y) = 0 if and only if one of the 
two equations g(x,y) = 0 or h(x,y) = 0 holds. So if we let U = {g(x 9 y) = 0} ? 
V = {h(x,y) = 0} denote these two varieties in C 2 , then 

w = u uv. 

It may be possible to see geometrically that W has no such decomposition. 

For example, we can use this method to show that the polynomial 

f(x,y) = x 2 + y 2 - l 

is irreducible. Since the total degree of/is 2, any proper factor of / has to be linear, 
of the form g(x,y) = ax + by c. And the solutions to a linear equation lie on a 
line, whereas {/ = 0} is a circle. Of course when we speak of lines and circles, we 
are actually talking about the real loci in U 2 . So this reasoning shows that / is irre¬ 
ducible in [R[x, j]. But in fact, the real locus of a circle has enough points to show 
irreducibility in C[jc ， j] too. Suppose that f = gh in where g and h are 

linear as before. Then every point of the real circle jc 2 + y 2 — 1=0 lies on one of 
the complex loci U, V. So at least one of these loci contains two real points. There is 
exactly one complex line (a line being the locus of solutions of a linear equation 
ax + by c = Q) which passes through two given points, and if these points are 
real, the linear equation defining the line is also real，up to a constant factor. This is 
proved by writing down the equation of a line through two points explicitly. So if/ 
has a linear factor, then it has a real one. But the circle does not contain a line. 

One can also prove that x 2 + y 2 — l is irreducible algebraically, using the 
method of undetermined coefficients (see Section 4, exercise 17). 


4L EXPLICIT FACTORIZATION OF POLYNOMIALS 

We now pose the problem of determining the factors of a given integer polynomial 
(4,1) f(x) = a n x n + * ••+ ajx + a 0 . 

What we want are the irreducible factors in Q[x], and by (3.5) this amounts to de¬ 
termining the irreducible factors in Z[x]. Linear factors can be found fairly easily. If 
b\X + bo divides f(x) , then b\ divides a n and b 0 divides a 0 . There are finitely many 
integers which divide a n and ao, so we can try all possibilities. In each case, we 
carry out the division and determine whether the remainder is zero. Or we may sub¬ 
stitute the rational number r = -b 0 /b\ into/( jc) to see if it is a root. 
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Though things are not so clear for factors of higher degree, Kronecker showed 
that the factors can be determined with a finite number of computations. His method 
is based on the Lagrange interpolation formula. Unfortunately this method requires 
too many steps to be practical except for factors of low degree, and a lot of work has 
been done on the problem of efficient computation. One of the most useful methods 
is computation modulo p, using the homomorphism Z[x] - > ¥ p [x]. If our polyno¬ 

mial f(x) factors in Z[x]; / = gh, then its residue f{x) modulo p also factors: 
f = gh. And since there are only finitely many polynomials of each degree in F p [jc ]， 
all factorizations there can be carried out in finitely many steps. 

(4.2) Proposition* Let/(x) = a n x n + *** + a 0 E Z[x] be an integer polynomial, 
and let p be a prime integer which does not divide a n . If the residue / of/modulo p 
is irreducible, then/is irreducible in Q[x]. 

Proof • This follows from an inspection of the homomorphism. We need the 
assumption that p does not divide a n in order to rule out the possibility that a factor g 
of / could reduce to a constant in f P [x]. This assumption is preserved if we replace / 
by the associated primitive polynomial. So we may assume that/is primitive. Since 
p does not divide a n , the degrees of/and / are equal. If/factors in Q[x], then it 
also factors in Z[x], by Corollary (3,5). Let / = gk be a proper factorization in 
Z[jc]. Since / is primitive, g and h have positive degree. Since deg / - deg/ and 
f = gh, it follows that deg g = deg g and deg h = deg hence that / = 吾 h is a 
proper factorization, which shows that / is reducible, □ 

Suppose we suspect that a given polynomial/(x) G Z[x] is irreducible. Then 
we can try—reduction modulo p for a few low primes，/? = 2 or 3 for instance, and 
hope that / turns out to be of the same degree and irreducible. If so, we will have 
proved that/is irreducible too. Note also that since Fp is a field, the results of Theo¬ 
rem (1.5) hold for the ring f p [x]. 

Unfortunately, there exist integer polynomials which are irreducible，though 
they can be factored modulo p for every prime p. The polynomial jc 4 — 10x 2 + 1 is 
an example. So the method of reduction modulo p will not always work. But it does 
work quite often. 

The irreducible polynomials in ¥ p [x] can be found by the “sieve” method. The 
sieve of Eratosthenes is the name given to the following method of determining the 
primes less than a given number n. We list the integers from 2 to n. The first one ， 2, 
is prime because any proper factor of 2 must be smaller than 2， and there is no 
smaller integer on the list. We make a note of the fact that 2 is prime, and then we 
cross out the multiples of 2 from our list. Except for 2 itself, they are not prime. The 
first integer which is left ， 3， is a prime because it isn't divisible by any smaller 
prime. We note that 3 is a prime and then cross out the multiples of 3 from our list. 
Again, the smallest remaining integer, 5, is a prime, and so on. 

2 3 5 x 7 x 义球 ） 11 ^ 13 M 堆％ 17 玫 19 •••. 

This method will also determine the irreducible polynomials in ¥ p [x]. We list 
all polynomials，degree by degree, and then cross out products. For example，the 
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linear polynomials in IF 2 [ 又 ] are x and x + L They are irreducible. The polynomials 
of degree 2 are x 2 , x 2 + x ， x 2 + 1， and x 2 + x + 1. The first three are divisible by 
jc or by jc + 1 ， so the last one is the only irreducible polynomial of degree 2 over F 2 . 

(4.3) The irreducible polynomials of degree ^ 4 over F 2 : 

X ， X + 1 ； X 2 + X + 1 ； X 3 + JC 2 + 1 ， X 3 + X + 1; 

x 4 + x 3 + 1 ， x 4 + x + 1 ， x 4 + x 3 + jr 2 + x + 1. 

By trying the polynomials on this list, we can factor all polynomials of degree 9 or 
less in F 2 [x]. 

As a sample application of 4.2, the polynomial x 4 — 6x 3 + 12x 2 — 3x + 9 is 
irreducible in Q[x], because its residue in F 2 [x] is jc 4 + jc + 1. 

(4.4) The monic irreducible polynomials of degree 2 over F 3 : 

X 2 -h 1, X 2 -h X — l, X 2 — X — L 

Reduction modulo p may help describe the factorization of a polynomial even 
though the residue is reducible. Consider the polynomial f(x) = jc 3 + 6jc + 3 for 
instance. Reducing modulo 3, we obtain x 3 . This doesn’t look like a promising tool. 
However, suppose that f{x) were reducible, say (ax + b)(cx 2 + dx + e)= 
jc 3 + 6x + 3. Then the residue of ax + ^ would have to divide x 3 in F 3 [jc ]， which 
would imply b = 0 (modulo 3). Similarly, we could conclude e 三 0 (modulo 3). It 
is impossible to satisfy both of these conditions，because be = 3. Therefore no such 
factorization exists, and/(x) is irreducible* 

The principle at work in this example is called the Eisenstein Criterion. 

(4.5) Proposition, Eisenstein Criterion: Let f(x) = a n x n + *** + a。E Z[x] be 
an integer polynomial，and let p be a prime integer. Suppose that the coefficients of 
/ satisfy the following conditions: 

(i) p does not divide a n ; 

(ii) p divides the other coefficients a n -\ 

(iii) p 2 does not divide a 0 . 

Then / is irreducible in Q[x] t If /is primitive, it is irreducible in Z[x]. 

For example, x 4 + 5(k 2 + 30jc + 20 is, irreducible in Q[x] and in / [jc]. 

Proof of the Eisenstein Criterion. Assume that the conditions are met for/. Let 
f denote the residue modulo p. The hypotheses (i) and (ii) imply that/ = a n x n and 
that A # 0. If/is reducible in Q[^c],_then it will factor in Z[x] into factors of posi¬ 
tive degree, say/ = gh. Then g and h divide a n x n , and hence each of these polyno¬ 
mials is a monomial. Therefore all coefficients of g and of h, except the highest 
ones, are divisible by p. Let the constant coefficients of g, h be b 0 , c 0 . Then the con- 
stant coefficient of/is a 0 = boCo. Since p divides bo and c 0 , it follows that p 2 divides 
a 0 , which contradicts (iii). This shows that/is irreducible. The last assertion follows 
from (3.6). □ 
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One of the most important applications of the Eisenstein Criterion is to prove 
the irreducibility of the cyclotomic polynomial x p ~ l + x p ~ 2 + + jc + 1 ， whose 

roots are the pth roots of unity，the powers of ^ = e 2 ^ 1 /p: 

(4.6) Corollary, Let /? be a prime. The polynomial/(x) = x p ~ l + x p ~ 2 + *•* + 
x + 1 is irreducible in Q[x], 

Proof• We note that (jc - l)f(x) = x p - 1. Next, we make the substitution 
x = y + l into this product, obtaining 

yf(y + l) = (j + ly 7 - i = ^ + 0 _1 + ⑸广 2 + … 

WehaveQ=p(p-l)... (P - / - D//L W < p’then the pr— isn’t a fc C - 

tor of /!， so /! divides the product (p — 1) … （/? — /+ 1) of the remaining terms in 

the numerator of the integer This implies that is divisible by p. Dividing 

the expansion of yf(y + 1) by y shows that / (少 + 1) satisfies the conditions of the 
Eisenstein Criterion, hence that it is an irreducible polynomial. It follows that /(jc) is 
irreducible too, □ 

It is instructive to examine the statement analogous to the Eisenstein Criterion 
when the ring of integers is replaced by the polynomial ring C[^]. Then Z[x] gets re¬ 
placed by C[/][x] — C[t,x] y the polynomial ring in two variables. 

(4.7) Proposition* Let f(t , x) be an element of C[t, x], written as a polynomial in 

x whose coefficients are polynomials in = a n {t)x n + ••• + a\{t)x + a 0 (t ), 

Suppose that 

(i) t does not divide a n (t), 

(ii) t divides a n -\(t),^.,a 0 (t), 

(iii) ? 2 does not divide ch(t). 

Then f(t,x) is irreducible in the ring C(/)[x]. If /is primitive, meaning that it has no 
factor which is a polynomial in t alone，then f is irreducible in C[t, x]. 

This can be proved exactly as we proved (4.5), replacing F^[x] by 
C[jc] = C[t, x]/{t). But let us examine the geometry of this situation by considering 
the locus W = {f(t, x) = 0} in complex 2-space. Conditions (i) and (ii) of (4.7) im¬ 
ply that /(0, x) = cx n , where c = a n (0) ^ 0. Consequently the only solution of 
/(/, x) = 0 with f = 0 is f = x = 0, so the variety W meets the x-axis {t = 0} only 
at the origin. 

Suppose that f(t,x) is reducible:/(f,x) = g(t,x)h(t,x). Then W is the union 
of the two varieties U = {g = 0} and V = {h = 0}. Also, cx n = f(0 9 x) = 
g(0,x)h(0,x). Hence g(0,x) is a constant times x r ，and h(0,x) is a constant times 
x n _ r ， where r is the degree of g in the variable x. Therefore g and h both vanish at 



406 


Factorization Chapter 11 


the origin. It follows that the origin is a singular point of W, meaning that the partial 
derivatives df/dx and df/dt both vanish at (0 ? 0), This is checked by differentiating 
the product gh. On the other hand ， df/dt (0,0) = da 0 jdt (0), and this is the linear 
coefficient of a 0 (t). If it vanishes, t 2 divides ao(0, contrary to (4,7iii)_ □ 



5. PRIMES IN THE RING OF GAUSS INTEGERS 

We have seen that the ring of Gauss integers is a Euclidean domain. Its units are 
{±1 ， 土 /}，and every element which is not zero and not a unit is a product of prime 
elements. In this section we will study these prime elements, called Gauss primes ， 
and their relation to prime integers, We looked at some examples in Section 2, 
where we saw that the prime integer 5 factors in Z[i]: 5 = (2 + i)(2 - /), while 3 
does not fector; 3 is a Gauss prime. Remember that since there are four units，there 
are four associate fectorizations of the integer 5 which we consider equivalent: 

(2 + i)(2 - i) = (-2 - /)(-2 + i) = (1 - 20(1 + 2i) = (-1 + 20(—1 - 2/). 

We will now show that the examples 3 and 5 exhibit the two ways that prime in¬ 
tegers can factor in the ring Z[i]. The story is summed up in this theorem: 

(5.1) Theorem. 

(a) Let p be a prime integer. Then either p is a Gauss prime, or else it is the 
product of two complex conjugate Gauss primes: p = irfr. 

(b) Let 77 be a Gauss prime. Then either tttt is a prime integer, or else it is the 
square of a prime integer. 

(c) The prime integers which are Gauss primes are those congruent to 3 modulo 4; 
that is, p = 3, 7, 11 ， 19, … • 

(d) Let /? be a prime integer. The following are equivalent: 

(i) /? is a product of two complex conjugate Gauss primes, 

(ii) p is the sum of two integer squares: p = a 2 + b 2 , with a,b E Z* 

(iii) The congruence x 2 = -1 (modulo p) has an integer solution. 

(iv) p = I (modulo 4)，or /? = 2; that is，p = 2, 5, 13, 17, •… 


It will take some time to prove all parts of this theorem* 

The following lemma follows directly from the definition of a Gauss integer: 
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(5.2) Lemma. A Gauss integer which is a real number is an ordinary integer. An 
ordinary integer d divides another integer a in J\i\ if and only if d divides a in Z. 
Moreover, d divides a Gauss integer a + bi if and only if d divides both a and b. 

Now to prove part (a) of the theorem, let p be an integer prime. Then p is not 
a unit in the ring Z[i], Hence it has a Gauss prime divisor, say 7 r = a + bi, where 
a, b E Z. The complex conjugate tt = a - bi also divides p because p = p, so 
7T7f = a 2 + b 2 divides p 2 in the ring of Gauss integers. Being an integer, tttt is an 
integer divisor of p 2 、There are two possibilities: tt may be an associate of /?. In this 
case, /? is a Gauss prime. Otherwise 7r is a proper divisor of p in the ring of Gauss 
integers, and then tttt is a proper divisor of 2 in the ring Z. Since 7T7f is a positive 
integer, tttt = p in this case. 

We can turn this argument around to prove (b). Let 7r be a Gauss prime. Then 
tttt is a positive integer, say tttt = n. We factor n into primes in the ring of in¬ 
tegers .This factorization will also be a fectorization in the Gauss integers, though 
not necessarily a prime fectorization. Since 7r is a Gauss prime which divides n in 
Z[/]，it divides one of the integer prime factors of n. Thus it divides an integer prime 
p t Then tttt is an integer divisor of/? 2 , hence tttt = p or p 2 . 

Note that part (c) of Theorem (5,1) is a formal consequence of (a) and of the 
equivalence of conditions (d)(i) and (d)(iv). So we need not consider part (c) further, 
and we now turn to the proof of part (d). It is easy to see that (i) and (ii) of part (d) 
are equivalent: Suppose that p = tttt for some Gauss prime tt = a + bL Then 
p = tttt — (a + bi)(a — bi) = a 2 + b 2 , so p is a sum of two integer squares• Con¬ 
versely, iip — a 2 + b 2 , then p — (a + bi)(a — bi) provides a factorization of p in 
the ring of Gauss integers, which is a prime fectorization because of (a). 

The equivalence of (d)(i) and (d)(iii) of Theorem (5.1) is harder to prove. To 
do so, we go back to the formal construction of the Gauss integers. The ring Z[i] is 
obtained from the ring Z by adjoining an element i with the relation / 2 + 1 = 0. So 
there is an isomorphism 

(5.3) Z[x]/(x 2 + 1)—^Z[i]. 

Let (p) denote the principal ideal generated by a prime integer p in the ring of Gauss 
integers. Its elements are the Gauss integers a + bi such that a and b are both divis¬ 
ible by p. Denote by R f the quotient ring Z[/]/ (p). Then R f can also be thought of 
as the ring obtained by introducing the two relations 

(5.4) ;c 2 + 1 = 0 and p = 0 
into the polynomial ring Z[x], So we have an isomorphism 

(5.5) l[x]/(x 2 + l,p)-^Z[ii/(p) ^ R f , 

where (x 2 + 1 ， p) denotes the ideal of Z[x] generated by the two elements. 

(5.6) Lemma. Let p be a prime integer. The following statements are equivalent: 
(i) p is a Gauss prime; 
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(ii) the ring R f = I_[i\/{p) is a field; 

(iii) x 2 + 1 is an irreducible polynomial in the ring ¥ p [x]. 

Proof. The equivalence of the first two statements follows from Proposition 
(2,14). What we are really after is the equivalence of (i) and (iii), and at first glance, 
these two statements do not seem to be related at all. It was in order to obtain this 
equivalence that we introduced the auxiliary ring R r • The proof is based on the fol¬ 
lowing elementary but remarkably useful observation, which follows from the Third 
Isomorphism Theorem [Chapter 10 (4,3b)]: 

(5.7) To construct the ring R f ， it does not matter which of the two relations 

(5.4) is introduced into the ring Z[x] first . 

So let us reverse the order and begin by killing the element p. The Substitution Prin¬ 
ciple teJJs us what we wiJJ get. The kernel of the homomorphism I[x] - ^ Fpfjt] is 

precisely the ideal pl[x]. Since this map is surjective, it induces an isomorphism 

Z[x]/ p Z[x] — ^ F P |XI. 

We now introduce our other relation x 2 + 1 = 0 into this ring，interpreting the 
coefficients of this polynomial as elements of ¥ p . The result is an isomorphism 

(5.8) F p [x]/(jc 2 + 1)—^/?’. 

Proposition (2,14)，applied to the ring F p [jc] ，shows that is a field if and only if 
x 2 + 1 is irreducible in ¥ P [x], □ 

We can now prove the equivalence of conditions (d)(i) and (d)(iii) of (5.1), We 
know by Lemma (5.6) that p is a Gauss prime if and only if x 2 + 1 is an irreducible 
polynomial in the ring ¥ p [x]. Since it is a quadratic polynomial, x 2 + 1 is reducible 
if it has a root in ¥ p and irreducible if it has no root. Also, the residue of an integer 
a (modulo p) is a root of x 2 + 1 if and only if a 2 三 —1 (modulo p). Thus the con¬ 
gruence x 2 = -1 (modulo p) has a solution if and only if jc 2 + 1 is reducible modulo 
p, which happens if and only if p is not a Gauss prime. The equivalence of (i) and 
(iii) follows. 

It remains to prove the equivalence of condition (iv) of part (d) with one of the 
other conditions. We will show its equivalence with condition (iii). The congruence 
X 2 = -1 (modulo 2) does have the solution x = 1, so it is sufficient to look at the 
other primes, that is, at the odd primes. The following lemma does the job: 

(5.9) Lemma. Let p be an odd prime, and let a denote the residue of an integer a 
modulo p. 

(a) The integer a solves the congruence x 2 = -l (modulo p) if and only if its 
residue 'a is an element of order 4 in the multiplicative group of the field ¥ p . 

(b) The multiplicative group ¥ p x contains an element of order 4 if and only if 
/? = 1 (modulo 4) _ 
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Proof ， There is exactly one element of order 2 in F p x ，namely the residue of 
-1. This is because an element of order 2 is a root of the polynomial jc 2 — 1， and 
we know the roots of this polynomial; They are 土 1 in any field [see (1.7)]. If a 
residue a has order 4 in F p x ，then a 2 has order 2; hence a 2 = -1，which means 
a 2 = (modulo p). Conversely，if a 2 = -1 (modulo p )，then a has order 4 in F p x , 
This proves part (a) of the lemma _ 

Now the order of the group ¥ p x is p — 1. So if this group contains an element 
of order 4， then p — 1 is divisible by 4， or equivalently p = 1 (modulo 4). Con¬ 
versely, suppose that p — 1 is divisible by 4, and let H be the Sylow-2 subgroup of 
¥ p x , whose order is the largest power 2 r of 2 which divides p _ L Since 4 divides 
p — 1 ， the order of H is at least 4, so there is an element a in H different from ± 1. 
This element does not have order 2, nor does it have order 1. But since // is a 2- 
group, the order of a is a power of 2. So some power of a has order exactly 4, 

This completes the proof of Theorem (5.1). □ 


6. ALGEBRAIC INTEGERS 

In the next sections we are going to study factorization of algebraic numbers in a 
simple but important case, that of quadratic imaginary integers. The ring of Gauss 
integers is our model here. It was in order to extend the properties of fectorization of 
ordinary integers to algebraic numbers that ideals were first introduced, and the ex¬ 
tension is very beautiful. 

In contrast to most of the topics we have studied，the arithmetic of quadratic 
number fields is not of universal importance. It has many applications to arithmetic ， 
but not so many in other areas of mathematics. Our reason for including this topic, 
aside from its elegance, is its historical importance. Many of our algebraic tools 
were first developed in order to extend arithmetic properties of the integers to alge¬ 
braic numbers. 


A typical application of algebraic numbers to arithmetic is to the problem of 
determining integer points on an ellipse such as 

(6.1) x 2 + 5;y 2 = p ， 

where for simplicity we assume that p is a prime. To determine integer points on the 
circle x 2 + y 2 = p, we may begin by fectoring the left side, obtaining 
(x + iy)(x — iy) — p, and then use arithmetic in the Gauss integers to analyze the 
factorization. We did this in our proof of Theorem (5.1). The analogous procedure 
for equation (6.1) leads to 


卜 + ^)(x - V^5y) = p, 

so we may attempt an analysis in the ring Z[V-5]. However，as we have seen ， fee 


torization is not unique in this ring. We will have some trouble. 
Another example is the femous Fermat Equation 


(6.2) 



=z 3 . 
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It was proved by Euler that this equation has no integer solutions，except for the triv¬ 
ial solutions in which one of the variables is zero. To analyze it，we may bring y 3 to 
the other side and factor, obtaining 

(6.3) x 3 = (z - y){z - fy)(z — fy )， 
where 

(6.4) I = |(-1 + V-3)= 产 " 3 

is a complex cube root of 1. One can then analyze this equation using arithmetic in 
the ring Z[f], This ring happens to be a Euclidean domain, so unique factorization is 
available. Unfortunately, the proof that (6.2) has no nontrivial solution is fairly com¬ 
plicated, so we will not give it. 

Problems of this type, which ask for integer solutions of polynomial equations ， 
are called Diophantine problems• We will analyze a few of them in Section 12, when 
the necessary tools have been assembled. 

A complex number a is called algebraic if it is the root of a nonzero polyno¬ 
mial/^) with rational coefficients (Chapter 10, Section 1). We can，of course, clear 
denominators in the coefficients of the polynomial/(x). So if a is an algebraic num¬ 
ber, then it is also the root of a polynomial with integer coefficients. The number a 
is called an algebraic integer if it is the root of a monic polynomial with integer 
coefficients，a polynomial of the form 

(6.5) f(x) = x n + 一 1 + _•_ + a 0 ， with a, G Z. 

Thus the cube root of unity <， being a root of the polynomial x 3 — 1 ， is an algebraic 
integer. 

Let a be an algebraic number - The set of all polynomials in Q[x] which have a 
as a root is the kernel of the substitution homomorphism 

Q[x] - >C, defined by / (x) ^^f(a ). 

So it is a principal ideal, generated by an irreducible element/(;c) of the polynomial 
ring which is called the irreducible polynomial for a over Q. (Why is/irreducible?) 
It is the polynomial of lowest degree having a as a root and is unique up to a con¬ 
stant factor. The degree of the irreducible polynomial for a is also called the degree 
of a over Q, 

We may choose this irreducible polynomial/(x) for a to be a primitive polyno¬ 
mial in Z[x]. Then f(x) also generates the ideal of Z[x] of all integer polynomials 
having a as a root. 

(6.6) Proposition. The kernel of the map Z[x] - > C sending is the 

principal ideal of Z[x] generated by the primitive irreducible polynomial for a. 

Proof. Let/(x) be the primitive irreducible polynomial for a. If g E Z[x] has 
a as a root, then/ divides g in Q[x], and hence/divides g in Z[x] too, by (3*4). So 
g is in the principal ideal of Z[x] generated by /, □ 
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Note that the leading coefficient of a polynomial f(x) divides the leading 
coefficient of any multiple in Z[jc]. So it follows from Proposition (6.6) that if the 
primitive irreducible polynomial/(x) for a is not monic，then a is not the root of 
any monic integer polynomial . 

(6.7) Proposition. An algebraic number a is an algebraic integer if and only if the 
primitive irreducible polynomial for a is monic. Equivalently，a is an algebraic in¬ 
teger if and only if the monic irreducible polynomial for a in Q[x] has integer 
coefficients. □ 

The primitive irreducible polynomial for the cube root of unity ( is x 2 + x + L 

(6.8) Corollary* A rational number r is an algebraic integer if and only if it is an 
ordinary integer. 

For，the monic irreducible polynomial over Q of a rational number r is x — r, □ 

Proposition (6,7) can be used to decide whether or not an algebraic number is 
an algebraic integer，provided that we can compute its irreducible polynomial. For 
example, a = ^(1 + V2) is a root of 4x 2 - Ax — L This is the primitive irre¬ 
ducible polynomial for a. Hence a is not an algebraic integer. 

The concept of algebraic integer was one of the most important discoveries of 
number theory. It is not easy to explain quickly why it is the right definition to use, 
but roughly speaking，we can think of the leading coefficient of the primitive irre¬ 
ducible polynomials f(x) for a as a “denominator,” If ct is the root of an integer 
polynomial/(x) = dx n + a n -\x n ^ 1 + + a 0 , then da is an algebraic integer, be¬ 

cause it is a root of the monic integer polynomial 

(6.9) + a n -ix n ~ l + da n - 2 x n ^ 2 + •*• + d n ~ 2 a lX + d n _W 

Thus we can “clear the denominator” in any algebraic number a by multiplying it 
with a suitable integer to get an algebraic integer. The leading coefficient is, how¬ 
ever, not a precise denominator. Thus if a = \(l + V2), then 2a is an algebraic 
integer，while the leading coefficient of its primitive irreducible polynomial is 4. 

In another direction, the example of the algebraic integer \ = ^(-1 + V-3) 
shows that we must not jump to conclusions just because some expression for an al¬ 
gebraic number has denominators. 

Explicit computation with algebraic integers is not very easy. It is a fact that 
they form a subring of C，that is, that sums and products of algebraic integers are 
algebraic integers，but this isn’t obvious. Rather than develop a general theory, we 
will work out the case of quadratic extensions explicitly. 

A quadratic number field F = consists of all complex numbers 

(6.10) a + b^fd, with a,b E Q, 

where is a fixed integer，positive or negative, which is not a rational square, The 
notation V5 will stand for the positive square root if d > 0 and for the positive 
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imaginary square root if d < OAf d has a square integer factor，we can pull it out of 
the radical and put it into b without changing the field. Therefore it is customary to 
assume that d is square free ， meaning that d = ±p { -- p r where the pi are distinct 
primes, or that d = —1. So the values we take are 

d = -1, ±2, ±3, ±5, ±6, ±7, ±10 ，，.，， 

The field F is called a real quadratic number field if d > Q, or an imaginary 
quadratic number field if d < 0. 

We will now compute the algebraic integers in F. The computation for a spe¬ 
cial value of d is no simpler than the general case. Nevertheless, you may wish to 
substitute a value such as d = 5 when going over this computation. We set 

(6.11) 8 = \fd. 

When d is negative, 5 is purely imaginary. Let 

a = a + b8 

be any element of F which is not in O, that is, such that b ^ Q. Then a r = a _ b8 
is also in F. If d is negative, a f is the complex conjugate of a. Note that a is a root 
of the polynomial 

(6.12) {x - a)(x ~ a f ) ~ x 2 - (a+a r )x + aa r = x 2 - lax + (a 2 ~b 2 d). 

This polynomial has the rational coefficients —2a and a 2 — b 2 d. Since a is not a 
rational number，it is not the root of a linear polynomial. So (6.12) is irreducible and 
is therefore the monic irreducible polynomial for a over (Q, According to (6.7)，ct is 
an algebraic integer if and only if (6,12) has integer coefficients. Thus we have the 
following corollary: 

(6.13) Corollary, a = a + b8 is an algebraic integer if and only if 2a and 
a 2 — b 2 d are integers. □ 

This corollary also holds when b = 0, because if a 2 is an integer, then so is a. If we 
like，we can use the conditions of the corollary as a definition of the integers in F. 

The possibilities for a and b depend on the congruence class of d modulo 4. 
Note that since d is assumed to be square free, the case d 三 0 (modulo 4) has been 
ruled out ，so d = 1,2, or 3 (modulo 4). 

(6.14) Proposition. The algebraic integers in the quadratic field F = Q[V5] have 
the form a = a + b8, where: 

(a) If d = 2 or 3 (modulo 4)，then a and b are integers. 

(b) If d = l (modulo 4)，then either a, b E ： Z or a ， b E Z + ^ - 

The cube root of unity f = ^(-1 + V-3) is an example of an algebraic integer of 
the second type. On the other hand，since -1=3 (modulo 4)，the integers in the 
field Q[i] are just the Gauss integers. 
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Proof of the Proposition. Since the coefficients of the irreducible polynomial 

(6.12) for a are 2a and a 2 — b 2 d，a is certainly an algebraic integer if a and b are 
integers. Assume that d = 1 (modulo 4) and that a，b E Z + *. (We say that they 
are half integers Then 2a E Z. To show that a 2 — b 2 d E Z, we write a = \m^ 
b = \n, where n are odd integers. Computing modulo 4, we find 

m 2 - n 2 d = (±1) 2 - (土 l) 2 • 1 = 0 (modulo 4). 

Hence a 2 — b 2 d = \ (m 2 — n 2 d) E as required 

Conversely, suppose that a is an algebraic integer. Then 2a G Z by Corollary 

(6.13) , There are two cases: either a EZ or a EZ + j. 

Case 1: a E Z. It follows that b 2 d E Z too. Now if we write b = m/n, where 
m, n are relatively prime integers and w > 0, then b 2 d = m 2 d/n 2 . Since d is square 
free，it can’t cancel a square in the denominator. So « = 1. If a is an integer, b must 
be an integer too. 

Case 2: a E Z + | is a half integer, say a = as before. Then 4a 2 E and the 
condition a 2 — b 2 d E Z implies that 4b 2 d G Z but b 2 d 芒 Z, Therefore b is also a 
half integer, say b = where n is odd. In order for this pair of values for a, b to 
satisfy a 2 — b 2 d E Z, we must have m 2 — n 2 d = 0 (modulo 4), Computing mod¬ 
ulo 4, we find that d = 1 (modulo 4). □ 

A convenient way to write all the integers in the case d = l (modulo 4) is to 
introduce the algebraic integer 

(6.15) 刀 =i(l + 6)， 
which is a root of the monic integer polynomial 

(6.16) x 2 - x + \(\ - d). 

(6.17) Proposition* Assume that d 三 l (modulo 4). Then the algebraic integers 
in F — Q[V5] are a + br], where a, b E Z. □ 

It is easy to show by explicit calculation that the integers in F form a ring R in 
each case, called the ring of integers in F. Computation in R can be carried out by 
high school algebra. 

The discriminant of F is defined to be the discriminant of the polynomial 
x 2 — d in the case R = I[8] and the discriminant of the polynomial 
x 2 — x + ^(1 — d) if R = Z[i7], This discriminant will be denoted by D. Thus 

mi 。、 if = 2,3 ( 

(6.18) D = \ j n - 1 (modulo 4), 

[a 11 a = 1 

Since D can be computed in terms of d, it isn’t very important to introduce a separate 
notation for it. However，some formulas become independent of the congruence 
class when they are expressed in terms of D rather than d. 

The imaginary quadratic case d < Ois slightly easier to treat than the real one， 
so we will concentrate on it in the next sections. In the imaginary case, the ring R 
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forms a lattice in the complex plane which is rectangular if d 三 2, 3 (modulo 4 )， 
and “isosceles triangular” if = 1 (modulo 4)_ When d = -1，/? is the ring of 
Gauss integers, and the lattice is square. When d = -3 ? the lattice is equilateral tri¬ 
angular. Two other examples are depicted below. 



d = -5 d — -1 

(6.19) Figure. Integers in some imaginary quadratic fields. 

The property of being a lattice is very special to rings such as those we are 
considering here，and we will use geometry to analyze them. Thinking of /? as a lat¬ 
tice is also useful for intuition. 

It will be helpful to carry along a specific example as we go. We will use the 
case d = -5 for this purpose. Since -5 三 3 (modulo 4 )， the ring of integers forms 
a rectangular lattice, and R = Z[5]，where 8 - V^5. 


Z FACTORIZATION IN IMAGINARY QUADRATIC FIELDS 

Let R be the ring of integers of an imaginary quadratic number field F = 0[5]. If 
a = a + b8 is in R 9 so is its complex conjugate a — a - b8. We call the norm of a 
the integer 

(7.1) N(a) = da. 

It is also equal to a 2 — b 2 d and to \ a | 2 , and it is the constant term of the irreducible 
polynomial for a over Q. Thus N (a) is a positive integer unless a = 0. Note that 

(7.2) N(M) = N ⑻ N(y). 
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This formula gives us some control of possible factors of an element aof R. Say that 
a = Then both terms on the right side of (7.2) are positive integers. So to 
check for factors of a ， it is enough to look at elements /3 whose norm divides N (a); 
this is not too big a job if a and b are reasonably small. 

In particular, let us ask for units of R: 

(7.3) Proposition • 

(a) An element a of /? is a unit if and only ifA^(a) = L 

(b) The units of R are {±1} unless d = -1 or -3 - If d = -l，so that R is the ring 
of Gauss integers, the units are {±1, ±z}, and if d = -3 they are the powers 
of the 6 th root of unity 5(1 + V^3). 

Proof. If a is a unit, then N{a)N{a~ l ) = N(l) = 1. Since N(a) and N(a~ l ) 
are positive integers, they are both equal to 1. Conversely, ifA^(a) = a a = 1, then 
a = a 1 . So a ' 1 E R, and a is a unit. Thus a is a unit if and only if it lies on the 
unit circle in the complex plane. The second assertion follows from the configuration 
of the lattice R [see Figure (6,19)]. □ 

Next we investigate factorization of an element a S R into irreducible factors. 

(7.4) Proposition. Existence of factorizations is true in R. 

Proof. If a = fiy is a proper fectorization in /?, then p , y aren't units. So by 
Proposition (7*3 )， N(a) = N(p)N(y) is a proper factorization in the ring of in¬ 
tegers. The existence of fectorizations in R now follows from the existence of factor¬ 
izations in Z- □ 

However, factorization into irreducible elements will not be unique in most 
cases* We gave a simple example with d = -5 in Section 2: 

(7.5) 6 = 2 • 3 = (1 + 8)(1 - 8), 

where 8 = V~5. For example, to show that 1 + 5 is irreducible, we note that its 
norm is (1 + 8)(1 — 8) — 6. A proper factor must have norm 2 or 3, that is ， abso¬ 
lute value V2 or V3. There are no such points in the lattice R. 

The same method provides examples for other values of d: 

(7.6) Proposition • The only ring R with d = 3 (modulo 4) which is a unique fac¬ 
torization domain is the ring of Gauss integers _ 

Proof • Assume that d = 3 (modulo 4) ， but that d -1. Then 

\ — d = 2( 1 ^ J and 1 - = (1 + 5)(1 — 5). 

There are two factorizations of 1 — d in R, The element 2 is irreducible because 
(2) = 4 is the smallest value >1 taken on by N(a). [The only points of R inside 
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the circle of radius 2 about the origin are 0 , 1 ， -1，when d = -5, —13, -17，.“. See 
Figure (6.19).] So if there were a common refinement of the above factorizations ， 2 
would divide either 1 + 5 or 1 —5 in/?，which it does not: ^ ± is not in R when 
d = 3 (modulo 4). □ 

Notice that this reasoning breaks down if d = l (modulo 4). In that case, 2 
does divide 1 + 5, because | + ^5 E R. In fact, there are more cases of unique fac¬ 
torization when d = l (modulo 4). The following theorem is very deep, and we will 
not prove it; 

(7.7) Theorem. Let R be the ring of integers in the imaginary quadratic field 
Q(Va), Then R is a unique fectorization domain if and only if d is one of the in¬ 
tegers —1 ， -2, -3, 一 7, -11 ， -19, —43, -67, -163. 

Gauss proved for these values of d that is a unique factorization domain. We will 
learn how to do this. He also conjectured that there were no others. This much more 
difficult part of the theorem was finally proved by Baker and Stark in 1966, after the 
problem had been worked on for more than 150 years. 

Ideals were introduced to rescue the uniqueness of factorization. As we know 
(2,12), R must contain some nonprincipal ideals unless it is a unique factorization 
domain. We will see in the next section how these nonprincipal ideals serve as sub¬ 
stitutes for elements. 

Note that every nonzero ideal A is a sublattice of R: It is a subgroup under ad¬ 
dition, and it is discrete because R is discrete. Moreover，if a is a nonzero element 
of A ， then a8 is in A too, and a,a8 are linearly independent over R. However ? not 
every sublattice is an ideal. 

(7.8) Proposition • If d = lor 3 (modulo 4 )， the nonzero ideals of R are the sub¬ 
lattices which are closed under multiplication by 8. If d = l (modulo 4)，they are 
the sublattices which are closed under multiplication by T) = \{l + 8). 

Proof. To be an ideal, a subset A must be closed under addition and under 
multiplication by elements of R. Any lattice is closed under addition and under mul¬ 
tiplication by integers. So if it is also closed under multiplication by 8, then it is also 
closed under multiplication by an element of the form a + bS, with a,b E Z, This 
includes all elements of if = 2,3 (modulo 4). The proof in the case that d 三 1 
(modulo 4) is similar. □ 

In order to get a feeling for the possibilities, we will describe the ideals of the 

ring R = / [V-5] before going on. The most interesting ideals are those which are 
not principal. 

(7.9) Theorem. Let R = Z[8], where 5 = and let A be a nonzero ideal of 

R. Let a be a nonzero element of A of minimal absolute value \a |. There are two 
cases: 
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Case 1: A is the principal ideal (a), which has the lattice basis (a,a8). 

Case 2: A has the lattice basis (a ， 士 (a + a5 ))， and is not a principal ideal. 

The second case can occur only if 5 (a + a5) is an element of R. The ideal 
A = ( 2,1 + 5)，which is depicted below, is an example. 



(7.10) Figure. The ideal ( 2 , 1 + 5 ) in the ring / [5], 8 = V - 5 


The statement of Proposition (7,9) has a geometric interpretation. Notice that 
the lattice basis (a,a8) of the principal ideal (a) is obtained from the lattice basis 
(l y 8) of R by multiplication by a. If we write a = re 10 , then the effect of multipli¬ 
cation by a is to rotate the complex plane through the angle 6 and then stretch by the 
fector r. So (a) and R are similar geometric figures，as we noted in Section 2. Simi¬ 
larly, the basis (a, 5 (a + a8)) is obtained by multiplication by \a from the basis 
(2 ,1 + 8). So the ideals listed in Case 2 are geometric figures similar to the one de¬ 
picted in Figure (7.10). The similarity classes of ideals are called the ideal classes ， 
and their number is called the class number of R. Thus Proposition (7.9) implies that 
the class number of / [V-5] is 2 . We will discuss ideal classes for other quadratic 
imaginary fields in Section 10. 

The proof of Theorem (7.9) is based on the following lemma about lattices in 
the complex plane: 

(7,11) Lemma* Let r be the minimum absolute value among nonzero elements of 
a lattice A, and let y be an element of A. Let D be the disc of radius \r about the 
point *y. There is no point of A in the interior of D other than its center Jy. 

The point 土 y may lie in A or not. This depends on A and on y. 

Proof. Let be a point in the interior of D. Then by definition of the disc ， 
I / 3 - -nj\ < or equivalently, j — 7 1 < r. If E A，then njS — y E A too. 
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In this case ， n/3 — 7 is an element of A of absolute value less than r, which implies 
that n 卩 一 y = 0, hence that = Jy. □ 

Proof of Theorem (7.9). Let a be the chosen element of A of minimal absolute 
value r. The principal ideal (a) = Ra consists of the complex numbers (a + b8)a, 
with a, b E Z, So it has the lattice basis (a, ad) as is asserted in the proposition. 
Since A contains a, it contains the principal ideal (a) too, and if A = (a) we are in 
Case L 

Suppose that A > (a), and let /3 be an element of A which is not in (a). We 
may choose j 8 to lie in the rectangle whose four vertices are 0,a,a8,a + a8 [see 
Chapter 5 (4.14)], Figure (7.13) shows a disc of radius r about the four vertices of 
this rectangle, and a disc of radius \ r about the three half lattice points 
\a8,{(a + a8), and a + Notice that the interiors of these discs cover the 
rectangle. According to Lemma (7.11), the only points of the interiors which can lie 
in A are the centers of the discs. Since is not in (a)，it is not a vertex of the 
rectangle. So p must be one of the half lattice points \(a + ad), or a 



(7.13) Figure* 

This exhausts the information which we can get from the fact that A is a lattice. 
We now use the fact that A is an ideal to rule out the two points ja8 and a + 
Suppose that \ab E A. Multiplying by 5， we find that \a8 2 = — fa E A too and 
since a E ： A that 5 a E A. This contradicts our choice of a. Next, we note that if 
a + \a8 were in A ? then \a8 would be in A too, which has been ruled out. The re¬ 
maining possibility is that = |(a + a8). If so, we are in Case 2. □ 
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8. WEAL FACTORIZATION 

Let R be the ring of integers in an imaginary quadratic field. In order to avoid confu¬ 
sion, we will denote ordinary integers by latin letters a 9 b ， …， elements of R by 
greek letters a ， j3，".，and ideals by capital letters A, 5, We will consider only 
nonzero ideals of R. 

The notation A = (a ， /3, … ， 7 ) stands for the ideal generated by the elements 
a ， /? ， “• ， 7 . Since an ideal is a plane lattice, it has a lattice basis consisting of two 
elements. Any lattice basis generates the ideal, but we must distinguish between the 
notions of a lattice basis and a generating set. We also need to remember the dic¬ 
tionary ( 2 . 2 ) which relates elements to the principal ideals they generate. 

Dedekind extended the notion of divisibility to ideals using the following 
definition of ideal multiplication: Let A and B be ideals in a ring R. We would like to 
define the product ideal AB to be the set of all products a/3, where a E A and 
E 5, Unfortunately, this set of products is usually not an ideal: It will not be 
closed under sums. To get an ideal，we must put into AB all finite sums of products 

(8.1) X a iPi^ where at E A and E B. 

i 

The set of such sums is the smallest ideal of R which contains all products ajS, and 
we denote this product ideal by AB. (This use of the product notation is different 
from its use in group theory [Chapter 2 (8.5)].) The definition of multiplication of 
ideals is not as simple as we might hope，but it works reasonably well. 

Notice that multiplication of ideals is commutative and associative, and that R 
is a unit element. This is why R = (1) is often called the unit ideal: 

(8.2) AR ^ RA^ A, AB = BA, A(BC) = (AB)C. 

(8.3) Proposition. 

(a) The product of principal ideals is principal: If A = (a) and B = (j8 )， then 
AB = (a/3). 

(b) Assume that A = (a) is principal，but let B be arbitrary. Then 

AB — aB = {ap \ p G B}. 

(c) Let a u ...,a m and ， • _• ， be generators for the ideals A and B respectively* 
Then AB is generated as an ideal by the mn products ai(3j. 

We leave this proof as an exercise. □ 

In analogy with divisibility of elements of a ring, we say that an ideal A 
divides another ideal B if there is an ideal C such that B = AC. 

To see how multiplication of ideals can be used，let us go back to the example 
d = -5, in which 2 • 3 = (1 + 8)(1 - 8). For uniqueness of factorization to hold 
in the ring R = Z[5]，there would have to be an element p E R dividing both 2 and 
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1 + 5, This is the same as saying that 2 and 1+5 should be in the principal ideal 
(p). There is no such element. However, there is an ideal ，not a principal ideal ， 
which contains 2 and 1+5, namely the ideal generated by these two elements. This 
ideal A = (2 ? 1 + 5) is depicted in Figure (7.10). We can make three other ideals 
using the factors of 6: 

A = (2, l — 8), B — (3,1+5 )， = (3,1 — d). 

The first of these ideals is denoted by A because it is the complex conjugate of 
the ideal A: 

(8.4) A = {a \ a E A}. 

As a lattice, A is obtained by reflecting the lattice A about the real axis. That the 
complex conjugate of any ideal is an ideal is easily seen. Actually, it happens that 
our ideal A is equal to its complex conjugate A, because \ — 8 = 2 — 8) E ： A. 

This is an accidental symmetry of the lattice A: The ideals B and B are not the same. 

Now let us compute the products of these ideals. According to Proposition 
(8.3c)，the ideal AA is generated by the four products of the generators (2,1+5) 
and (2,1 — 8) of A and A: 

AA = (4 ， 2 + 25 ， 2 — 25 ， 6) • 

Each of these four generators is divisible by 2, so AA C (2). On the other hand, 

2 = 6 — 4 is in AA. Therefore ⑵ C AA, so 

A4 = (2)! 

[The notation (2) is ambiguous, because it can denote both 2R and 2Z. It stands for 
2R here.] Next, the product AB is generated by the four products: 

AB = (6, 2 + 25, 3 + 35, —4 + 28). 

Each of these four elements is divisible by 1 + 8. Since 1 + 5 is in AB, we find that 
AB = (1 + S) t Similarly, AB - (1 — 5) and BB = (3). 

It follows that the principal ideal (6) is the product of the four ideals: 

(8.5) (6)= ⑵⑶ =(M)(M) = (AB){AB) = (1 + 8)(1 - 8). 

Isn’t this beautiful? The ideal factorization (6) = AABB has provided a common 
refinement of the two factorizations (2.7). 

The rest of this section is devoted to proving unique fectorization of ideals in 
the rings of integers of an imaginary quadratic number field. We will follow the dis¬ 
cussion of factorization of elements as closely as possible. 

The first thing to do is to find an analogue for ideals of the notion of a prime 
element. 

(8.6) Proposition* Let P^be an ideal of a ring R which is not the unit ideal. The 
following conditions are equivalent ： 

(i) If a,)3 are elements of R such that aj8 E P, then a E P or E P. 
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(ii) If A, B are ideals of R such that AB C P, then A 匚 P or 5 [ P. 

(iii) The quotient ring R/P is an integral domain. 

An ideal which satisfies one of these conditions is called a prime ideal. 

For example, every maximal ideal is prime，because if M is maximal ? then 
R/M is a field，and a field is an integral domain. The zero ideal of a ring R is prime 
if and only if R is an integral domain. 

Proof of the Proposition: The conditions for R = R[P to be an integral do¬ 
main are that R ^ 0 and that aj 8 = 0 implies a = 0 or /3 = 0. These conditions 
translate back toP ^ R and if ap E P then a E P or j 8 E P. Thus (i) and (iii) are 
equivalent. The fact that (ii) implies (i) is seen by taking A = (a) and B = (j 8 ). The 
only surprising implication is that (i) implies (ii). Assume that (i) holds，and let A, B 
be ideals such that AB C P. If A is not contained in P, there is some element 
a E A which is not in P. If is an element of B, then E AB; hence aj 8 E P. 
By part (i )，）8 E F. Since this is true for all of its elements ， B C P as required. □ 

We now go back to imaginary quadratic number fields ， 

(8.7) Lemma. Let A C 5 be lattices in U 2 . There are only finitely many lattices L 
between A and B, that is, such that A C L C B. 

Proof• Let (a\,a 2 ) be a lattice basis for A，and let P be the parallelogram with 
vertices 0 ， ai ， a 2 ，ai + a 2 . There are finitely many elements of B contained in P 
[Chapter 5 (4.12)]，so if L is a lattice between A and B, there are finitely many pos¬ 
sibilities for the set LDP. Call this set S. The proof will be completed by showing 
that S and A determine the lattice L. To show this, let y be an element of L. Then 
there is an element of a E A such that y — a is in P, hence in S. [See the proof of 
(4.14) in Chapter 5]. Symbolically，we have L = 5 + A. This describes L in terms 
of S and A , as required. □ 

( 8 . 8 ) Proposition. Let R be the ring of integers in an imaginary quadratic number 
field. 

(a) Let be a nonzero ideal of R. There are finitely many ideals between B and R. 

(b) Every proper ideal of R is contained in a maximal ideal. 

(c) The nonzero prime ideals of R are the maximal ideals. 

Proof. 

(a) This follows from lemma (8.7). 

(b) Let be a proper ideal. Then B is contained in only finitely many ideals. We 
can search through them to find a maximal ideal. 

(c) We have already remarked that maximal ideals are prime_ Conversely, let P be a 
nonzero prime ideal. Then P has finite index in R. So R/P is sl finite integral do- 
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main, and hence it is a field [Chapter 10 (6.4)]. This shows that P is a maximal 
ideal. □ 

(8.9) Theorem* Let R be the ring of integers in an imaginary quadratic field F. 
Every nonzero ideal of R which is not the whole ring is a product of prime ideals. 
This factorization is unique, up to order of the factors. 

This remarkable theorem can be extended to other rings of algebraic integers, 
but it is a very special property of such rings. Most rings do not admit unique factor¬ 
ization of ideals. Several things may fail, and we want to take particular note of one 
of them. We know that a principal ideal (a) contains another principal ideal (j8) if 
and only if a divides )3 in the ring. So the definition of a prime element tt can be 
restated as follows: If (tt) D (aj8), then (tt) D (a) or (tt) D (j8). The second of the 
equivalent definitions (8.6) of a prime ideal is the analogous statement for ideals: If 
P ] AB ，then P D A or P D B. So if inclusion of ideals were equivalent with di¬ 
visibility, the proof of uniqueness of factorizations would carry over to ideals. Un¬ 
fortunately the cumbersome definition of product ideal causes trouble. In most rings ， 
the inclusion A D B does not imply that A divides B. This weakens the analogy be¬ 
tween prime ideal and prime element. It will be important to establish the equiva¬ 
lence of inclusion and divisibility in the particular rings we are studying. This is 
done below，in Proposition (8.11). 

We now proceed with the proof of Theorem (8.9). For the rest of this section ， 
R will denote the ring of integers in an imaginary quadratic number field- The proof 
is based on the following lemma; 

(8.10) Main Lemma. Let R be the ring of integers in an imaginary quadratic 
number field. The product of a nonzero ideal and its conjugate is a principal ideal of 
R generated by an ordinary integer; 

AA = ⑻， for some « E Z. 

The most important point here is that for every ideal A there is some ideal B such 
that AB is principal. That A does the job and that the product ideal is generated by an 
ordinary integer are less important points. 

We will prove the lemma at the end of the section, Let us assume it for now 
and derive some consequences for multiplication of ideals. Because these conse¬ 
quences depend on the Main Lemma, they are not true for general rings. 

(8.11) Proposition. Let R be the ring of integers in an imaginary quadratic num¬ 
ber field, 

(a) Cancellation Law: Let A, B, C be nonzero ideals of R. If AB D AC then 
B D C. If AB = AC, then B = C. 

(b) If A and B are nonzero ideals of R, then A D B if and only if A divides B, that 
is ， if and only if B = AC for some ideal C. 
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(c) Let P be a nonzero prime ideal of /?, If P divides a product AB of ideals, then 
P divides one of the factors A or B. 

Proof, (a) Assume that AB D AC. If A = (a) is principal, then AB = aB and 
AC = aC (8.3). Viewing these sets as subsets of the complex numbers, we multiply 
the relation aB D aC on the left by a~ l to conclude that B D C. So the assertion is 
true when A is principal. In general，if AB Z) AC, then multiply both sides by A and 

apply the Main Lemma; nB = AAB D AAC = nC, and apply what has been shown. 
The case that AB = AC is the same. 

(b) The implication which is not clear is that if A contains B then A divides B. We 
will first check this when A = (a) is principal. In this case, to say that (a) D B 
means that a divides every element j8 of 5. Let C = aT x B be the set of quotients, 
that is，the set of elements a~ l /3, with p E B. You can check that C is an ideal and 
that aC = B. Hence B = AC in this case* Now let A be arbitrary, and assume that 
A D B. Then (n)^ = AA_D AB. By what has already been shown，there is an ideal C 
such that nC = AB, or AAC = AB. By the Cancellation Law, AC = B. 

(c) To prove part (c) of the proposition, we apply part (b) to translate divisibility into 
inclusion. Then (c) follows from the definition of prime ideal. □ 

Proof of Theorem (8.9). There are two things to prove. First we must show that ev¬ 
ery proper，nonzero ideal A is a product of prime ideals. If A is not itself prime, then 
it is not maximal, so we can find a proper ideal A\ strictly larger than A. Then A\ 
divides A (8.11b), so we can write A = A\B\ * It follows that A C B\. Moreover，if 
we had A = B u the Cancellation Law would imply R = A u contradicting the fact 
that Ai is a proper ideal. Thus A < B x . Similarly ，A < A x . Since there are only 
finitely many ideals between A and R, this process of factoring an ideal terminates. 
When it does, all factors will be maximal，and hence prime. So every proper ideal A 
can be factored into primes. 

Now to prove uniqueness, we apply the property (8.11c) of prime ideals: If 
Pi • • P r = Qi Q s , with Pi , Qj prime，then P x divides Q { •- Q s , and hence it di¬ 
vides one of the factors，say Q\ . Since Q\ is maximal ，Pi = Q\. Cancel by (8.11a) 
and use induction on r. □ 

(8.12) Theorem. The ring of integers /? is a unique factorization domain if and 
only if it is a principal ideal domain. If so, then the factorizations of elements and of 
ideals correspond naturally. 

Proof. We already know that a principal ideal domain has unique factorization 

(2.12) . Conversely, suppose that /? is a unique factorization domain，and let P be any 
nonzero prime ideal of R. Then P contains an irreducible element, say 丌， For, any 
nonzero element a of P is a product of irreducible elements ， and，by definition of 
prime ideal, P contains one of its irreducible factors. By (2.8), an irreducible ele¬ 
ment 7 T is prime，that is ， (tt) is a prime ideal. By (8.6), (rr) is maximal. Since 
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( 77 ) C P，it follows that (77) = P，hence that P is principal. By Theorem (8.9), ev¬ 
ery nonzero ideal A is a product of primes; hence it is principal (8.3a)* Thus /? is a 
principal ideal domain. The last assertion of the theorem is clear from (2.2). □ 

Proof of the Main Lemma (8 JO). We can generate A as a lattice by two elements, 
say a ， /3_ Then A is certainly generated as an ideal by these same elements, and 
moreover 万 , generate A. Hence the four products aa, pp generate the 

idea] AA. Consider the three elements da, f3(3, and ap + ap of AA. They are all 
equal to their conjugates and hence are rational numbers. Since they are algebraic 
integers，they are ordinary integers. Let n be their greatest common divisor in Z. 
Then « is a linear combination of act, /3/3 ? 十 with integer coefficients. Hence 
n is in the product ideal AA. Therefore AA D (n). If we show that n divides_each of 
the four generators of the ideal AA in R, then it will follow that (n) D AA, hence 
that (n) = AA, as was to be shown. 

Now by construction, n divides aa and /3/3 in Z, hence in R. So we have to 
show that n divides aj8 and in R. The elements (aj8 ) / n and (ap)/n are roots of 
the polynomial x 2 — rx -\- s, where 

ap + a/3 j aa BB 

r = - and 5 = —— —, 

n n n 

By definition of n, these elements r,s are integers，so this is a monic equation in 
Z[x]. Hence {ap)/n and (ap)/n are algebraic integers, as required. □ 

Note. This is the only place where the definition of algebraic integer is used di¬ 
rectly. The lemma would be false if we took a smaller ring than R, for example, if 
we didn’t take the elements with half integer coefficients when d = l (modulo 4)_ 

9. THE REIATIONBET^mENPRIME IDEALS OFR 
AND PRIME INTEGERS 

We saw in Section 5 how the primes in the ring of Gauss integers are related to in¬ 
teger primes. A similar analysis can be made for the ring R of integers in a quadratic 
number field. The main difference is that R is usually not a principal ideal domain, 
and therefore we should speak of prime ideals rather than of prime elements. This 
complicates the analogues of parts (c) and (d) of Theorem (5,1)，and we will not 
consider them here. [However, see (12.10).] 

(9.1) Proposition* Let P be a nonzero prime ideal of R. There is an integer prime 
p so that either P = (p) or PP = (/?), Conversely, let be a prime integer. There is 
a prime ideal P of R so that either P — (p) or PP = (p ). 

The proof follows that of parts (a) and (b) of Theorem (5.1) closely- □ 
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The second case_ of (9.1) is often subdivided into two cases，according to 
whether or not P and P are equal. The following terminology is customary: If (p) is 
a prime ideal，then we say_that p remains prime in R. If PP = (/?)， then we say that 
p splits in R, unless P = P, in which case we say that P ramifies in R. 

Let us analyze the behavior of primes further. Assume that d = 2 or 3 (mod¬ 
ulo 4). In this case, R = Z[5] is isomorphic to Z[x]/(x 2 — d). To ask for prime ide¬ 
als containing the ideal (p) is equivalent to asking for prime ideals of the ring R/(p) 
[Chapter 10 (4.3)]. Note that 

(9.2) R/(p) - I[x]/(x 2 - d,p). 

Interchanging the order of the two relations x 2 — d = 0 and p = 0 as in the proof 
of Theorem (5.1)，we find the first part of the proposition below. The second part is 
obtained in the same way，using the polynomial (6.16). 

(9.3) Proposition. 

(a) Assume that d 三 2 or 3 (modulo 4). An integer prime p remains prime in R if 
and only if the polynomial x 2 — d is irreducible over 

(b) Assume that d = l (modulo 4). Then p remains prime if and only if the poly¬ 
nomial jc 2 - x + 5(1 — ^) is irreducible over ¥ p . □ 

10. WEAL CLASSES IN IMAGINARY QUADRATIC FIELDS 

As before, R denotes the ring of integers in an imaginary quadratic number field. In 
order to analyze the extent to which uniqueness of factorization of elements fails in 
R, we introduce an equivalence relation on ideals which is compatible with ideal 
multiplication and such that the principal ideals form one equivalence class. It is rea¬ 
sonably clear which relation to use: We call two ideals A 9 B similar (A ~ B) if there 
are nonzero elements ct，t £ so that 

(10.1) aB = tA. 

This is an equivalence relation. The equivalence classes for this relation are called 
ideal classes ， and the ideal class of A will be denoted by (A). 

We could also take the element A = ct~ 1 t of the quadratic number field 
F = Q[8] and say that A and B are similar if 

(10.2) B = AA，for some A E Q[5]. 

Similarity has a nice geometric interpretation. Two ideals A and B are similar 
if the lattices in the complex plane which represent them are similar geometric 
figures, by a similarity which is orientation-preserving. To see this, note that a lat¬ 
tice looks the same at all points. So a similarity can be assumed to relate 0 in A to 0 
in Then it will be described as a rotation followed by a stretching or shrinking, 
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that is，as multiplication by a complex number A. Since multiplication by A carries a 
nonzero element a E A to an element Aa = /3 E 5, 入 =/3a— 1 is automatically in 
the field F. 

An ideal B is similar to the unit ideal R if and only if 5 = A/? for some A in the 
field Then A is an element of B, hence of R. In this case, B is the principal ideal (A). 
So we have the following: 

(10*3) Proposition. The ideal class (R) consists of the principal ideals, □ 

Figure (10.4) shows the principal ideal (1 + 6) in the ring Z[6 ]， where d 2 = -5. 

• • • * . * . 



* 




We saw in (7*9) that there are two ideal classes. Each of the ideals A = (2, 1 + 5) 
and 5 = (3,1 + 5)，for example, represents the class of nonprincipal ideals. In this 
case 2B = (\ + 8)A. These ideals are depicted in Figure (10.5). 

• *•*•*•*•*• •*••*••*••* 



(10.5) Figure^ The ideals (2, 1 + 8) and (3, 1 + 8). 
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(10.6) Proposition* The ideal classes form an abelian group %, with law of com¬ 
position induced by multiplication of ideals: 

— class of AB = (AB)\ 

the class of the principal ideals is the identity: (R) = (1), 

Proof• If A ~ A ; and B — B\ then A' = AA and B f = fxB for some 
A ? fx S F = Q[5]; hence A f B f = This shows that (AB) = hence 

that this law of composition is well-defined. Next, the law is commutative and asso¬ 
ciative because multiplication of ideals is, and the class of R is an identity (8.2). Fi¬ 
nally, AA = (n) is principal by the Main Lemma 」 8.10)，Since the class of the prin¬ 
cipal ideal (n) is the identity in %， we have (A)(A) = {R) f so (A) = (A) -1 . □ 

(10.7) Corollary. Let R be the ring of integers in an imaginary quadratic number 
field，The following assertions are equivalent: 

(i) /? is a principal ideal domain; 

(ii) /? is a unique factorization domain; 

(iii) the ideal class group ^ of /? is the trivial group. 

For to say that % is trivial is the same as saying that every ideal is similar to the unit 
ideal, which by Proposition (10.3) means that every ideal is principal. By Theorem 
(8.12)，this occurs if and only if /? is a unique factorization domain. □ 

Because of Corollary (10.7)，it is natural to count the ideal classes and to con¬ 
sider this count, called the class number, a measure of nonuniqueness of factoriza¬ 
tion of elements in R. More precise information is given by the structure of ^ as a 

group. As we have seen (7.9), there are two ideal classes in the ring Z[V-5] ? so its 
ideal class group is a cyclic group of order 2 and its class number is 2. 

We will now show that the ideal class group ^ is always a finite group. The 
proof is based on a famous lemma of Minkowski about lattice points in convex re¬ 
gions. A bounded subset S of the plane IR 2 is called convex and centrally symmetric 
if it has these properties: 

(10.8) (a) Convexity: lfp,q E S, then the line segment joining p to ^ is in S. 

(b) Central symmetry: If p E S, then -p E S. 

Notice that these conditions imply that 0 E 5, unless S is empty. 

(10.9) Minkowski’s Lemma* Let L be a lattice in R 2 , and let 5 be a convex, cen¬ 
trally symmetric subset of U 2 . Let A(L) denote the area of the parallelogram spanned 
by a lattice basis for L. If 

Area(5) > 4A(L), 

then S contains a lattice point other than 0. 
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Proof • Define U to be the convex set similar to S, but with half the linear di¬ 
mension. In other words, we put/? G [/ if 2p E S. Then U is also convex and cen¬ 
trally symmetric，and Area(t/) = |Area(5), So the above inequality can be restated 

as Atc 3 l(U) > 維 ) • 



(10,10) Figure* 


(10.11) Lemma* There is an element a E L such that L/ Pi (?7 + a) is not 
empty. 

Proof. Let P be the parallelogram spanned by a lattice basis for L. The trans¬ 
lates P + a with a G L cover the plane without overlapping except along their 
edges. The heuristic reason that the lemma is true is this: There is one translate 
U + a for each translate P + a, and the area of U is larger than the area of 尸 • So 
the translates U + a must overlap. To make this precise，we note that since t/ is a 
bounded set, it meets finitely many of the translates P + a, say it meets 
尸 + + a*. Denote by Ui the set (P + a；) Pi U. Then U is cut into the 

pieces U\ ， … ， Uk, and Area(t/) = 2 Area(t/,). We translate Ui back to P by subtract¬ 
ing at, setting Vi = Ui - at y and we note that Vi = P Pi (?7 - So Vi. is a subset 
of P, and Area(V/) = Area(R). Then 2 Area(y/) — Area(t/) > A(L) = Area (/^) . 
This implies that two of the sets Vi must overlap, that is, that for some i ^ j, 
(U — at) H (U — aj) is nonempty. Adding ai and setting a = a/ — aj, we find 
that t/Pl(t/ + a)is nonempty too. 

Returning to the proof of Minkowski’s Lemma, choose a as in Lemma 

(10.11) ， and let /? be a point of U Pi (?7 + a). From p S U + a, it follows that 
p - a ^ U. By central symmetry, q = a - p E U too. The midpoint between 
p and q is which is also in U, because U is convex. Therefore a E 5, as re¬ 
quired. □ 

(10.12) Corollary. Any lattice L in U 2 contains a nonzero vector a such that 

a | 2 < 4A(L)/7 t. 
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Proof ，We apply Minkowski’s Lemma, taking for S a circle of radius r about 
the origin. The lemma guarantees the existence of a nonzero lattice point in S, pro¬ 
vided that 7rr 2 > 4A(L) ? or that r 2 > 4A(L)/7t, So for any positive number e , there 
is a lattice point a with | a | 2 < 4A(L)/7 t + e. Since there are only finitely many lat¬ 
tice points in a bounded region and since e can be arbitrarily small, there is a lattice 
point satisfying the desired inequality. □ 

We now return to ideals in the ring R of integers in an imaginary quadratic 
field. There are two measures for the size of an ideal, which turn out to be the same. 
The first is the index in R. Since an ideal A is a sublattice of R, it has finite index: 

[R : A] = number of additive cosets of A in R, 

This index can be expressed in terms of the area of the parallelogram spanned by 
basis vectors: 


(10.13) Lemma • Let («] ? a 2 ) and {b x , b 2 ) be lattice bases for lattices B 〕 A in (R 2 , 
and let A(A) and A(5) be the areas of the parallelograms spanned by these bases. 
Then [B: A] = A(A)/A(B), 

We leave the proof as an exercise. □ 

(10.14) Corollary. 

(a) Let A be a plane lattice. The area A(A) is independent of the lattice basis for A. 

(b) If C D B D A are lattices，then [C:A] = [C: 5][5 : A]. □ 


It is easy to compute the area A(/?) using the description (6.14) of the ring: 


(10.15) 



if d 三 2, 3 (mod 4) 
ifd = l (mod 4) 


where D is the discriminant (6.18). 

The other measure of the size of an ideal can be obtained from the Main 
Lemma (8.10): We write AA = (n) and take the integer n (chosen > 0, of course). 
This is analogous to the norm of an element (7.1) and is therefore called the norm of 
the ideal: 


(10.16) 


N (A) = n ， if AA = (n). 


It has the multiplicative property 

(10.17) N{AB) = N{A)N{B), 

because ABAB = AABB = (nm) if N(B) = m. Note also that if A is the principal 
ideal (a), then its norm is the norm of a: 

(10.18) iY((a)) = aa = N (a) y 
because (a)(a) = (aa). 
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(10.19) Lemma. For any nonzero ideal A of R, 

[R: A] = N(A). 

(10.20) Corollary. Multiplicative property of the index: Let A and B be nonzero 
ideals of R. Then 

[R : AB] = [/? : A][/? : fi]. □ 

Let us defer the proof of Lemma (10.19) and derive the finiteness of the class 
number from it. 


(10.21) Theorem. Let fi = 2v\d\/7t. Every ideal class contains an ideal A such 
that N (A) < fi. 

Proof • Let A be an ideal. We have to find another ideal A f in the class of A 
whose norm is not greater than /x. We apply Corollary (10,12): There is an element 
a E A with 

N(a) — I a | 2 < 4A(A)/tt. 

Then A D (a)- This implies that A divides (a), that is, that AC = (a) for some ideal 
C. By the multiplicative property of norms (10.17) and by (10,18 )， N(A)N (C)= 
N(a) ^ 4A(A)/7t. Using (10.13) ，（ 10.14)，and (10.19)，we write A (A)= 

[/?:A]A(/?) = ^(AjVl D|. Substituting for A(A) and cancelling N(A), we find 
N(C) < fx. 

Now since CA is a principal ideal, the class (C) is the inverse of (A), i.e .， 
{C) — (A). So we have shown that (A) contains an ideal whose norm satisfies the re¬ 
quired inequality，Interchanging the roles of A and A completes the proof. □ 

The finiteness of the class number follows easily: 

(10.22) Theorem. The ideal class group % is finite. 

Proof. Because of (10.19) and (10.21)，it is enough to show that there are 
finitely many ideals with index [/?: A] so it is enough to show that there are 
only finitely many sublattices L G R with [/? : L] < fi. Choose an integer n < /x, 
and let L be a sublattice such that [/? : L] = n. Then R/L is an abelian group of or¬ 
der n, so multiplication by n is the zero map on this group. The translation of this 
fact to R is the statement nR C L: Sublattices of index n contain nR. Lemma (8.7) 
implies that there are finitely many such lattices L . Since there are also finitely many 
possibilities for n, we are done. □ 

The ideal class group can be computed explicitly by checking which of the sub- 
lattices L C R of index are ideals. However，this is not efficient. It is better to 

look directly for prime ideals. Let [fx] denote the largest integer less than jjl. 
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(10.23) Proposition. The ideal class group is generated by the classes of the 
prime ideals P which divide integer primes p < [fx]. 

Proof. We know that every class contains an ideal A of norm iV (A) < fi. and 
since N(A) is an integer, iV(A) < [fx]. Suppose that an ideal A with norm < ^ is 
factored into prime ideals: A = … P r . Then N(A) = N ( 尸 i) … #(/\)， by 
(10.17). Hence N(Pi) < [fx] for each i. So the classes of prime ideals P of norm 
< [fx] form a set of generators of %， as claimed, □ 


To apply this proposition, we examine each prime integer p < [/x]. If p re¬ 
mains prime in R, then the prime ideal (p) is principal，so its class is trivial. We 
throw out these primes. If p does not remain prime in R, then we include the class of 
one of its two prime ideal factors P in our set of generators. The class of the other 
prime factor is its inverse. It may still happen that P is a principal ideal, in which 
case we discard it. The remaining primes generate % , 

Table (10.24) gives a few values which illustrate different groups. 


TABLE 10.24 SOME IDEAL CLASS GROUPS 


d 

D 


Ideal class group 

-2 

-8 

1 

trivial 

-5 

-20 

2 

order 2 

-13 

-52 

4 

order 2 

-14 

— 56 

4 

order 4, cyclic 

-21 

-84 

5 

Klein four group 

-23 

-23 

3 

order 3 

—26 

-104 

6 

order 6 

-47 

-47 

4 

order 5 

-71 

-71 

5 

order 7 


(10.25) Examples. To apply Proposition (10.23)，we factor (p) into prime ideals for 
all prime integers p < 

(a) d — -7. In this case [/a] = 1. Proposition (10,23) tells us that the class group% 
is generated by the empty set of prime ideals. So is trivial, and /? is a unique fac¬ 
torization domain. 

(b) d = -67, Here R = where rj = Hi + and [fi] = 5. The ideal class 
group is generated by the prime ideals dividing 2,3,5. According to Proposition 
(9.3), a prime integer p remains prime in R if and only if the polynomial 
jc 2 — x + 17 is irreducible modulo p. This is true for each of the primes 2, 3, 5. So 
the primes in question are principal, and the ideal class group is trivial. 

(c) d = -14. Here = 4, so % is generated by prime ideals dividing (2) and (3). 
The polynomial x 2 + 14 is reducible, both modulo 2 and modulo 3, so by (9^.3) nei¬ 
ther of these integers remains prime in R. Say that (2)j= PP and (3) = QQ. As in 
the discussion of Z[V^5] ? we find that P = (2,8) = P. The ideal class (P) has or¬ 
der 2 in 
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To compute the order of the class (Q), we may compute the powers of the ideal 
explicitly and find the first power whose lattice is similar to R. This is not efficient. It 
is better to compute the norms of a few small elements of R, hoping to deduce a rela¬ 
tion among the generators. The most obvious elements to try are 8 and 1+8. But 
N (8) = 14 and N(1 + 5) = 15， These are not as good as we may hope for ， be¬ 
cause they involve the primes 7 and 5, whose factors are not among our generators. 
WeM rather not bring in these extra primes. The element 2 + 8 is better: 
N(2 + 5) = (2 + 8)(2 — 5) = 2 ■ 3 . 3, This gives us the ideal relation 

(2 + 5)(2 - 8 卜 PPQQQQ = P 2 Q 2 Q 2 . 

Since 2+8 and 2 — 5 are not associates, they do not generate the same ideal. On 
the other hand，they generate conjugate ideals. Taking these_facts into account, the 
only possible prime fectorizations of (2 + 8) are PQ 2 and PQ 2 . Which case we have 
depends on which factor of (3) we label as So we may suppose that (2 + 5)= 
PQ 2 . Then since (2 -h 3) is a principal ideal, (P)(Q) 2 = ]_ m Hence (Q) 2 = 
(P)' 1 — (P). This shows that % is the cyclic group of order 4 generated by (Q), 

(d) d = —23, and hence R = Z[tj] where 7 ) = |(1 + 8). Then [^t] = 3 ， so is 
generated by the classes of the prime ideals dividing (2) and (3). Both of these 
primes split in R, because the polynomial jc 2 — jc + 6 is reducible modulo 2 and 
modulo 3 (9,3). In fact, (2) = PP, where P has the lattice base ( 2 , 17 ) [see (7,8)]. 
This is not a principal ideal. 

Say that (3) = QQ. To determine the structure of the ideal class group, we 
note that N(rj) = 2*3 and N(\-\-r]) = 2*2*2. Therefore 

( 刀 )㈤ =PPQQ and ( 1 + 17 )( 1 + 17 ) = ( 8 ) = (2 ) 3 = P 3 P 3 . 

Interchanging the roles of P,P and of Q,Q as necessary, we obtain ( 17 ) — PQ and 
(1 + rj) — P 3 or P 3 , Therefore (P) 3 = ( 1 ) and (Q)= 〈尸 >— 1 in %, The ideal class 
group is a cyclic group of order 3. □ 

Proof of Lemma (10.19). This lemma is true for the unit ideal R. We will 
prove that [/? : P] = N (P) if 尸 is a prime ideal ， and we will show that if P is prime 
and if A is an arbitrary nonzero ideal, then [R : AP] = [R : A][R : P]. It will follow 
that if [R:A] = N(A), then [/?MP] = N(AP ). Induction on the length of the 
prime factorization of an ideal will complete the proof. 

(10,26) Lemma. Let n be an ordinary integer, and let A be an ideal. Then 

[R : nA] = n 2 [R : A]. 

Proof. We know that R D A D nA, and therefore (10,14b) [R : nA]= 
[R : A][A : nA]. Thus we must show that [A : nA] = Now A is a lattice，and nA 
is the sublattice obtained by stretching by the factor n: 
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■ ■擊參 .拳 

(10.27) Figure. 3A = {*}. 

Clearly，[A : nA] = n 2 , as required. □ 

We return to the proof of Lemma (10.19). There are two cases to consider for 
the ideal P. According to (9.1)，there is an integer prime p so that either 尸 = (p) or 

PP = (p). 

In the first case, N(P) = p 2 , andA^ = pA. We can use Lemma (10.26) twice 
to conclude that [R : AP] = p 2 [R : A] and [/? : P] = p 2 [R : R] = p 2 . Thus 
[R : AP] = [R : A][R : P] and [/? : 尸 ] = N(P), as required. 

In the second case, N (P) = p t We consider the chain of ideals 
A > AP > APP. It follows from the Cancellation Law (8.11a) that this is a strictly 
decreasing chain of ideals, hence that 

(10.28) [R: A] <[R: AP] < [R : APP\ 

Also，since PP = (p), we have APP = pA. Therefore we may apply Lemma 
(10.26) again，to conclude that [R : APP] — p 2 [R : A]. Since each index (10.28) is 
a proper division of the next, the only possibility is that [R : AP] = p[R : A]. Ap¬ 
plying this to the case A = R shows that [R : P] ~ p = N (P). So we find 
[R : AP] = [/? : A][R : P] and [/? : 尸 ] = N(P) again. This completes the proof. □ 



In this section we will take a brief look at real quadratic number fields Q[5], where 
8 2 = d > 0. We will use the field Q[V2] as an example. The ring of integers in 
this field is 

(11.1) R = Z[V2] = {a + b^/l\a,b G Z}. 
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Since <2[V5] is a subfield of the real numbers, the ring of integers is not em¬ 
bedded as a lattice in the complex plane，but we can represent /? as a lattice by using 
the coefficients (a ，办 ） as coordinates• A slightly more convenient representation of R 
as a lattice is obtained by associating to the algebraic integer a + bVd the point 
(m ， u) ，where 

(11,2) u = a + b\fd, v = a — b^/d. 

The resulting lattice is depicted below for the case d = 2: 



Since the (w ， u)-coordinates are related to the (a, -coordinates by the linear trans¬ 
formation (11-2)，there is no essential difference between the two ways of depicting 
R ，though since the transformation is not orthogonal, the shape of the lattice is dif¬ 
ferent in the two representations. 

Recall that the field Q[\/d] is isomorphic to the abstractly constructed field 

(11.4) F = Q[x]/(x 2 - d). 

Let us replace Q[V5] by F and denote the residue of x in F by 8. So this element 8 
is an abstract square root of d rather than the positive real square root. Then the co¬ 
ordinates v represent the two ways that the abstractly given field F can be embed¬ 
ded into the real numbers; namely u sends and v sends -V5, 

For a = a b8 E 0[5], let us denote by a f the “conjugate” element 
a — bd. The norm of a is defined to be 

(11.5) N(a) = aa f — a 2 — db 1 ， 
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in analogy with the imaginary quadratic case (7.1). If a is an algebraic integer，then 
N (a) is an integer, not necessarily positive, and 

(11.6) N (afi) = apa ' = N(a)N(P). 

With this definition of norm，the proof of unique factorization of ideals into prime 
ideals in imaginary quadratic fields carries over. 

There are two notable differences between real and imaginary quadratic fields _ 
The first is that, for real quadratic fields，ideals in the same class are not similar geo¬ 
metric figures when embedded as lattices in the (u,t;)-plane by (1L2). In particular, 
principal ideals need not be similar to the lattice R. The reason is simple: Multiplica¬ 
tion by an element a = a + b8 stretches the u-coordinate by the factor a + bxQ, 
and it stretches the u-coordinate by the different factor a — b\fd. This feet compli¬ 
cates the geometry slightly, and it is the reason that we developed the imaginary 
quadratic case first. It does not change the theory in an essential way: The class 
number is still finite. 

The second difference is more important. It is that there are infinitely many 
units in the rings of integers in a real quadratic field. Since the norm (a) of an al¬ 
gebraic integer is an ordinary integer, a unit must have norm ±1 as before [see 
(7.3)]，and if N(a) = aa f = ±1, then ±a f is the inverse of a , so a is a unit. For 
example, 

(11.7) a = 1 + V2, a 2 = 3 + iVl 

are units in the ring R = Z[V2]. Their norms are -1 and 1 respectively. The ele¬ 
ment a has infinite order in the group of units of R. 

The condition N (a) = a 2 — 2b 2 = ±1 for units translates in (“ ， v)- 
coordinates to 

(11.8) uv = ±1, 

The units are the points of the lattice which lie on one of the two hyperbolas uv = 1 
and uv = -l. These hyperbolas are depicted in Figure (11.3). It is a remarkable 
feet that real quadratic fields always have infinitely many units or, what amounts to 
the same thing，that the lattice of integers always contains infinitely many points on 
the hyperbola uv = l. This fact is not obvious, either algebraically or geometrically* 

(11.9) Theorem, Let R be the ring of integers in a real quadratic number field 
The group of units in R is infinite, 

(11.10) Lemma. Let A denote the area of the parallelogram spanned by a lattice 
basis of /?， in its embedding into the (w ， u)-plane. There are infinitely many elements 
(3 of R whose norm N (/3) is bounded, in fact, such that | N (/3) | ^ B, where B is any 
real number > A. 

Proof. In the embedding into the (w ， u)-plane，the elements of norm r are the 
lattice points on the hyperbola xy = r, and the elements whose norm is bounded in 
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absolute value by a positive number B are those lying in the region 颂 bounded by the 
four branches of the hyperbolas xy = B,xy = -B. 



(11 ， 11) Figure. 

Choose an arbitrary positive real number mo. Then the rectangle S whose vertices are 
( 土 M 。， ±b/uo) lies entirely in the region 颂， and the area of this rectangle is 4 凡 So if 
5 > A, then Minkowski’s Lemma guarantees the existence of a nonzero lattice point 
a in 5. The norm of this point is bounded by B. This is true for all uq ，and if uo is 
very large, the rectangle S is very narrow* On the other hand, there are no lattice 
points on the m 0 - axis, because there are no nonzero elements in R of norm zero. So 
no particular lattice point is contained in all the rectangles S. It follows that there 
are infinitely many lattice points in 颂 . □ 

Since there are only finitely many integers r in the interval -5 < r < 5, 
Lemma (11.10) implies the following corollary: 

(1 M2) Corollary. For some integer r, there are infinitely many elements of R of 
norm r. □ 

Let r be an integer. We will call two elements /3 / = + m8 of R congruent 

modulo r if r divides /3i — /3 2 in R m If d = 2 or 3 (modulo 4)，this just means that 
m\ = m 2 and ru = n 2 (modulo r). 

(11J3) Lemma. Let /3i ? /32 be elements of R with the same norm r, and which 
are congruent modulo r. Then /3 i// 32 is a unit of R. 

Proof. It suffices to show that is in R ，because the same argument will 
show that ^S 2 //3 i E R, hence that f3i//3 2 is a unit. Let /3 / = m/ — md be the conju¬ 
gate of fit. Then /3i//32 = - /3 t jii/r. But /3 2 ' = /3 / (modulo r)，so 

/3i ^ 2 f = /3i /3 / = r (modulo r). Therefore r divides ， which shows that 
/3 i//3 2 E R, as required. □ 
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Proof of Theorem (1L9). We choose r so that there are infinitely many ele¬ 
ments /3 = m + of norm r. We partition the set of these elements according to 
the congruence classes modulo r. Since there are finitely many congruence classes ， 
some class contains infinitely many elements. The'ratio of any two of these elements 
is a unit. □ 


12 * SOME DIOPHANTINE EQUATIONS 

Diophantine equations are polynomial equations with integer coefficients, which are 
to be solved in the integers. The most famous is the Fermat Equation 

(12.1) x n + y n = z n . 

Fermat’s “Last Theorem” asserts that if n > 3 this equation has no integer solutions 
z, except for the trivial solutions in which one of the variables is zero, Fermat 
wrote this theorem in the margin of a book, asserting that the margin did not con¬ 
tain enough room for his proof. No proof is known today, though the theorem has 
been proved for all n < 10 5 . Also, a theorem proved by Faltings in 1983, which ap¬ 
plies to this equation as well as to many others，shows that there are only finitely 
many integer solutions for any given value of n. 

This section contains a few examples of Diophantine equations which can be 
solved using the arithmetic of imaginary quadratic numbers. They are included only 
as samples. An interested reader should look in a book on number theory for a more 
organized discussion. 

We have two methods at our disposal, namely arithmetic of quadratic number 
fields and congruences, and we will use both. 

(12.2) Example. Determination of the integers n such that the equation 

x 2 y 2 = n 

has an integer solution. 

Here the problem is to determine the integers n which are sums of two squares 
or ， equivalently, such that there is a point with integer coordinates on the circle 
x 2 + y 2 = n. Theorem (5.1) tells us that when pis a prime，the equation x 2 + y 2 — 
p has an integer solution if and only if either p = 2 or p = \ (modulo 4). It is not 
difficult to extend this result to arbitrary integers. To do so, we interpret a sum of 
squares a 2 + as the norm aa of the Gauss integer a = a + bi. Then the prob¬ 
lem is to decide which integers n are the norms of Gauss integers. Now if a Gauss 
integer a is factored into Gauss primes，say a = 7Ti m ， then its norm factors too: 
N(a) = #( 77 * 1 ) … N(7Tk)^ So if n is the norm of a Gauss integer, then it is a product 
of norms of Gauss primes，and conversely. The norms of Gauss primes are the 
primes p = I (modulo 4)，the squares of primes p = 3 (modulo 4)，and the prime 
2. Thus we have the following theorem: 
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(12.3) Theorem. The equation jc 2 + >? 2 = n has an integer solution if and only if 
every prime p which is congruent 3 modulo 4 has an even exponent in the factoriza¬ 
tion of n. □ 

(12.4) Example. Determination of the integer solutions of the equation 

y 2 + 13 = x 3 . 

We factor the left side of the equation，obtaining 

(y + 5)(^ - 5) = x 3 ， 

where 8 = V-13. The ring of integers R — Z[6] is not a unique factorization do- 
main, so we will analyze this equation using ideal factorization* 

(12.5) Lemma. Let a, b be integers, and let R be any ring containing Z as a sub- 
ring. If a and b are contained in a common proper ideal A of R, then they have a 
common prime factor in Z. 

Proof. We prove the contrapositive. If a, b have no common prime factor in 
Z ， then we can write 1 = ra sb 9 r,s E ： Z. This equation shows that if a y b are in 
an ideal A of R, then 1 E A too. Hence A is not a proper ideal. □ 

(12.6) Lemma. Let x,y be an integer solution of the equation (12.4). The two el¬ 
ements y + 8 and y — 8 have no common prime ideal factor in R. 

Proof. Let P be a prime ideal of R which contains y 8 and y — 8. Then 
2y E ： P and 25 E P. Since P is a prime ideal，either 2 E 尸 ， or else y G P and 
8 S P. 

In the first case, 2 and y 2 + 13 are not relatively prime integers by Lemma 
(12.5 )， and since 2 is prime，it divides y 2 + 13 in Z. This implies that 2 divides x 
and that 8 divides j 2 + 13 = x 3 . So y must be odd. Then y 2 = l (modulo 4); hence 
y 2 + 13 = 2 (modulo 4), This contradicts x 3 = 0 (modulo 8). 

Suppose that y,8 E P. Then 13 G P 9 and hence 13 and y are not relatively 
prime in Z ， that is, 13 divides y. Therefore 13 divides x, and reading the equation 
y 2 + 13 = x 3 modulo 13 2 ，we obtain 13 = 0 (modulo 13 2 )，which is a contradic¬ 
tion. So we have shown that y 8 and y — 8 are relatively prime in □ 

We now read the equation (y + 8)(y — 8) — (x) 3 as an equality of principal 
ideals of R, and we factor the right side into primes，say 

(；v + s)b-s) = (/v“/y 3 . 

On the right we have a cube, and the two ideals on the left have no common prime 
factor. It follows that each of these ideals is a cube too，say (y + 8) = A 3 and 
(y — 8) = A 3 for some ideal >1. Looking at our table of ideal classes, we find that 
the ideal class group of R is cyclic of order 2. So the ideal classes of A and A 3 are 
equal. Since A 3 is a principal idea], so is A ， say A = (u + v8), for some integers 
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u, v. We have been lucky. Since the units in R are ±1 ， (u + v8) 3 = ±(^ + 5)_ 
Changing sign if necessary, we may assume that (u + v8) 3 = y 8. 

We now complete the analysis by studying the equation y + 5 = (m + v8) 3 . 
We expand the right side, obtaining 

^ + 5 = (u 3 —39uv 2 ) + (3u 2 v~ 13t; 3 )5. 

So y = u 3 — 39uv 2 and 1 = (3m 2 — \3v 2 )v. The second equation implies that v = 
±1 and that 3m 2 — 13 = ±1. The only possibilities are w = ±2 and v = ^1- Then 
y = ±70 and x = (u + v8)(u — v8) = 17. These values do give solutions, so the 
integer solutions of the equation y 2 + 13 = jc 3 are x = \1 and y = ±70. □ 

(12.7) Example. Determination of the prime integers p such that 

x 2 + 5y 2 — p 

has an integer solution. 

Let 8 = V-5 ? and let R = Z[5]. We know (9,3a) that the principal ideal (p) 
splits in R if and only if the congruence x 2 = -5 (modulo p) has an integer solution. 
If (p) = PP and if 尸 is a principal ideal，say P = (a + b8), then (p)= 
(a + b8)(a — b8) = (a 2 + 5b 2 ). Since the only units in R are ±1， a 2 + 5b 2 — 
土/?， and since a 2 + 5b 2 is positive, a 2 + 5b 2 = p. 

Unfortunately, R is not a principal ideal domain- So it is quite likely that (p) 二 
PP but that P is not a principal ideal. To analyze the situation further，we use the 
fact that there are exactly two ideal classes in R. The principal ideals form one class, 
and the other class is represented by any nonprincipal ideal. The ideal 
A = (2,1 + 5) is one nonprincipal ideal, and we recall that for this ideal 
A 2 = AA = (2). Now since the ideal class group is cyclic of order 2, the product of 
any two ideals in the same class is principal. Suppose that (p) = PP and that P is 
not a principal ideal，Then AP is principal, say AP = (a 4- b8). Then 
(a + b8)(a — b8) = APAP = (2p). We find that a 2 + 5b 2 = 2p. 

(12.8) Lemma. Let p be an odd prime. The congruence x 2 = -5 (modulo p) has 
a solution if and only if one of the two equations x 2 + 5y 2 = p or x 2 + 5y 2 = 2p 
has an integer solution. 

Proof. If the congruence has a solution，then (p) = PP, and the two cases are 
decided as above, according to whether or not P is principal. Conversely, if 
a 2 + 5b 2 = p, then (p) splits in R, and we can apply (9,3a). If a 2 + 5b 2 = 2p, 
then (a + b8)(a — b8) = (2p) = AA(p). It follows from unique factorization of 
ideals that (p) splits too, so (9.3a) can be applied again- □ 

This lemma does not solve our original problem, but we have made progress. 
In most such situations we could not complete our analysis. But here we are lucky 
again，or rather this example was chosen because it admits a complete solution: The 
two cases can be distinguished by congruences. If a 2 + 5b 2 = p, then one of the 
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two integers a, b is odd and the other is even. We compute the congruence modulo 
4, finding that a 2 + 5b 2 = 1 (modulo 4). Hence p 三 1 (modulo 4) in this case. If 
a 2 + 5b 2 = 2p, we compute the congruences modulo 8. Since p = 1 or 3 (modulo 
4)，we know that 2p = 2 or 6 (modulo 8), Any square is congruent 0, 1 ， or 4 (mod¬ 
ulo 8). Hence 5b 2 = 0 ? 5, or 4 (modulo 8)，which shows that a 2 + 5b 2 can not be 
congruent to 2 (modulo 8). Thus p = 3 (modulo 4) in this case. We have therefore 
proved the following lemma: 

(12.9) Lemma « Let p be an odd prime. Assume that the congruence x 2 = ^5 
(modulo p) has a solution. Then x 2 + 5y 2 = p has an integer solution if p = l 
(modulo 4)，and x 2 + 5y 2 = 2p has an integer solution if p = 3 (modulo 4). 

There remains finally the problem of characterizing the odd primes p such that 
the congruence x 2 = -5 has a solution modulo p. This is done by means of the 
amazing Quadratic Reciprocity Law, which asserts that x 2 = 5 (modulo p) has a so¬ 
lution if and only if jc 2 三 p (modulo 5) has one! And the second congruence has a 
solution if and only if /? = ± 1 (modulo 5 )， Combining this with the previous lemma 
and with the fact that — 1 is a square modulo 5 ， we find: 

(12.10) Theorem. Let p be an odd prime. The equation x 2 + 5y 2 — p has an in¬ 
teger solution if and only if 三 1 (modulo 4) and p = ±1 (modulo 5), □ 

Nullum vero dubium nobis esse videtur ， 
quin multa eaque egregia in hoc genere adhuc lateant 

in quibus alii vires suas exercere possint • 

Karl Friedrich Gauss 


EXERCISES 

L Factorization of Integers and Polynomials 

1. Let a, b be positive integers whose sum is a prime p. Prove that their greatest common 
divisor is 1. 

2. Define the greatest common divisor of a set of n integers，and prove its existence. 

3. Prove that if d is the greatest common divisor of ，•■.，％，then the greatest common 
divisor of a\/d^^,a n /d is 1, 

4. (a) Prove that if w is a positive integer which is not a square of an integer，then Vn is 

not a rational number. 

(b) Prove the analogous statement for nth roots. 

5. (a) Let a, b be integers with a ^ 0, and write b = aq + r ， where 0 ^ r < \a\. Prove 

that the two greatest common divisors (a, b) and (a, r) are equal, 

(b) Describe an algorithm, based on (a), for computing the greatest common divisor. 
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(c) Use your algorithm to compute the greatest common divisors of the following: 

(a) 1456, 235, （ b) 123456789, 135792468. 

6. Compute the greatest common divisor of the following polynomials: x 3 — 6x 2 + x + 4, 

x 5 — 6x + 1. 

7. Prove that if two polynomials /， g with coefficients in a field F factor into linear factors 
in F, then their greatest common divisor is the product of their common linear factors. 

8. Factor the following polynomials into irreducible factors in ¥ p [x ]. 

(a) x 3 + x + l, p = 2 (b) x 2 — 3x — 3, p = 5 (c) x 2 + 1， p 二 7 

9* Euclid proved that there are infinitely many prime integers in the following way: If 

P\，"，Pk are primes, then any prime factor p of n = (/>i … p n ) + 1 must be different 
from all of the . 

(a) Adapt this argument to show that for any field F there are infinitely many monic ir¬ 
reducible polynomials in F[x]. 

(b) Explain why the argument fails for the formal power series ring F [[jt]]. 

10 . Partial fractions for integers: 

(a) Write the fraction r = 7/24 in the form r = a/S + b/3. 

(b) Prove that if n = uv, where u and v are relatively prime, then every fraction r — 

m/n can be written in the form r = aju + b/v. 

(c) Let n —〜/^ •.， /u be the factorization of an integer n into powers of distinct primes: 

rii — pfi. Prove that every fraction r = m/n can be written in the form 
r = + ••• + rrtk/rik - 

11. Chinese Remainder Theorem: 

(a) Let m be relatively prime integers, and let a, b be arbitrary integers. Prove that 
there is an integer x which solves the simultaneous congruence x ^ a (modulo m) 
and x = b (modulo n). 

(b) Determine all solutions of these two congruences. 

12. Solve the following simultaneous congruences. 

(a) x = 3 (modulo 15 )， x = 5 (modulo 8), x =2 (modulo 7)_ 

(b) x = 13 (modulo 43 )，x 三 7 (modulo 71). 

13. Partial fractions for polynomials: 

(a) Prove that every rational function in C(x) can be written as sum of a polynomial and 
a linear combination of functions of the form l/(x — a) 1 . 

(b) Find a basis for C(x) as vector space over C. 

* 14 . Let F be a subfield of C，and let/ E F [x] be an irreducible polynomial. Prove that/has 
no multiple root in C. 

15* Prove that the greatest common divisor of two polynomials / and g in Q[x] is also their 
greatest common divisor in C[x]. 

16* Let a and b be relatively prime integers. Prove that there are integers m, n such that 
a m + b n ^ 1 (modulo ab). 

2. Unique Factorization Domains^ Principal Ideal Domains^ 
and Euclidean Domains 

1. Prove or disprove the following. 

(a) The polynomial ring [R[x ? y] in two variables is a Euclidean domain. 

(b) The ring Z[x] is a principal ideal domain. 
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2. Prove that the following rings are Euclidean domains. 

(a) C = e 27r " 3 ⑼ 

3. Give an example showing that division with remainder need not be unique in a Euclidean 
domain. 

4. Let m 9 n be two integers. Prove that their greatest common divisor in Z is the same as 
their greatest common divisor in Z[/]. 

5. Prove that every prime element of an integral domain is irreducible. 

6. Prove Proposition (2.8), that a domain R which has existence of factorizations is a 
unique factorization domain if and only if every irreducible element is prime. 

7. Prove that in a principal ideal domain R, every pair a, b of elements, not both zero, has 
a greatest common divisor d, with these properties: 

(i) d = ar + bs ， for some r y s G R; 

(H) d divides a and fo; 

(iii) if e G R divides a and b, it also divides d. 

Moreover, d is determined up to unit factor. 

8* Find the greatest common divisor of (11 + 7i, 18 — 0 in Z[i]. 

9* (a) Prove that 2,3, 1 土 V--5 are irreducible elements of the ring R = Z[V~5] and 
that the units of this ring are ± 1, 

(b) Prove that existence of factorizations is true for this ring* 

10. Prove that the ring U[[t]] of formal real power series is a unique factorization domain. 

11 • (a) Prove that if R is an integral domain，then two elements a, b are associates if and 
only if they differ by a unit factor. 

*(b) Give an example showing that (a) is false when R is not an integral domain. 

12. Let /? be a principal ideal domain. 

(a) Prove that there is a least common multiple [a, b] = m of two elements which are not 
both zero such that a and b divide m y and that if a, b divide an element r G R ， then 
m divides r. Prove that m is unique up to unit factor. 

(b) Denote the greatest common divisor of a and b by (a, b). Prove that (a 9 b)[a, b] is an 
associate of ab. 

13. If a, b are integers and if a divides b in the ring of Gauss integers，then a divides 6 in Z. 

14. (a) Prove that the ring R (2.4) obtained by adjoining 2 k -th roots Xk of x to a polynomial 

ring is the union of the polynomial rings F[xk], 

(b) Prove that there is no factorization of X\ into irreducible factors in R. 

15* By a refinement of a factorization a = b\ 6* we mean the expression for a obtained by 
factoring the terms bi- Let R be the ring (2.4). Prove that any two factorizations of the 
same element a G /? have refinements，all of whose factors are associates, 

16* Let R be the ring F[u,v,y,Xi,X 2 ,X 3 ^^y{x\y = uv,x 2 2 = x u x 3 2 = 又 2 , …） _ Show that 
u, v are irreducible elements in R but that the process of factoring uv need not terminate. 

17. Prove Proposition (2,9) and Corollary (2.10), 

18. Prove Proposition (2.11). 

19. Prove that the factorizations (2.22) are prime in Z[/]. 

20* The discussion of unique factorization involves only the multiplication law on the ring /?， 
so it ought to be possible to extend the definitions. Let 5 be a commutative semigroup, 
meaning a set with a commutative and associative law of composition and with an iden- 
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tity. Suppose the Cancellation Law holds in S: If ab = ac then b 二 c. Make the appro¬ 
priate definitions so as to extend Proposition (2.8) to this situation. 

*21* Given elements v n in Z 1 2 , we can define a semigroup S as the set of all linear com¬ 
binations of (a ， … ， v n ) with nonnegative integer coefficients, the law of composition be¬ 
ing addition, Determine which of these semigroups has unique factorization, 

J* Gauss’s Lemma 


1. Let a, b be elements of a field F, with a ^ Q t Prove that a polynomial/( jc) E F [x] is ir¬ 
reducible if and only iff(ax + b) is irreducible. 

2. Let F = C(x), and let /, g E C[x,y], Prove that if / and g have a common factor in 
F[y], then they also have a common factor in C[x,y]. 

3. Let/be an irreducible polynomial in C[x y y]^ and let g be another polynomial. Prove that 
if the variety of zeros of g in C 2 contains the variety of zeros of/ ， then/divides g, 

4* Prove that two integer polynomials are relatively prime in Q[x] if and only if the ideal 
they generate in Z[x] contains an integer. 

5. Prove Gauss’s Lemma without reduction modulo p, in the following way; Let a ； be the 
coefficient of lowest degree i of/which is not divisible by p. Sop divides a v if v < /, but 
p does not divide at , Similarly, let bj be the coefficient of lowest degree of g which is not 
divisible by p. Prove that the coefficient of h of degree i + j is not divisible by p, 

6. State and prove Gauss’s Lemma for Euclidean domains. 

7. Prove that an integer polynomial is primitive if and only if it is not contained in any of 
the kernels of the maps (3.2). 


8 . 


9. 


10 . 


Prove that det 

z 


y 

w 

」 


is irreducible in the polynomial ring C[x ， y ， z ， w]. 


Prove that the kernel of the homomorphism I[x] - sending + V2 is a 

principal ideal, and find a generator for this ideal. 

⑻ Consider the map if/: C[x, j] - >C[r] defined by/(jc ， y) …^ >/(〆 ， 〆)• Prove that its 

kernel is a principal ideal，and that its image is the set of polynomials p(t) such that 
p f (Q) — 0. 

(b) Consider the map <p: C[x,y] - >C[/] defined by f(x,y) /w ^(t 2 — tj 3 — t 2 ). 

Prove that ker is a principal ideal, and that its image is the set of polynomials p{t) 
such that/?(0) = p(l). Give an intuitive explanation in terms of the geometry of the 
variety {/ = 0} in C 2 * 


4L Explicit Factorization of Polynomials 


1. Prove that the following polynomials are irreducible in Q[x], 

⑻ ； c 2 + 21 x + 213 (b) x 3 + 6x + 12 (c) 8x 3 一 6x + 1 (d) x 3 + 6x 2 + 1 

(e) x 5 — 3 jc 4 + 3 

2. Factor x 5 + 5x + 5 into irreducible factors in Q[x] and in 

3. Factor x 3 + x + 1 in ¥ p [x] 9 when p = 2, 3, 5. 
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4* Factor jc 4 + x 2 + 1 into irreducible factors in Q[x], 

5* Suppose that a polynomial of the form x 4 + bx 2 + c is a product of two quadratic fac¬ 
tors in Q[x]. What can you say about the coefficients of these factors? 

6. Prove that the following polynomials are irreducible, 

(a) x 2 + x + 1 in the field F 2 (b) x 2 + 1 in F 7 (c) x 3 - 9 in F 3 i 

7. Factor the following polynomials into irreducible factors in Q[x]. 

⑻ jc 3 — 3x - 2 (b) x 3 — 3x + 2 (c) ;c 9 - 6 a: 6 + 9x 3 - 3 

8. Let be a prime integer. Prove that the polynomial x n - p is irreducible in Q[x]. 

9. Using reduction modulo 2 as an aid, factor the following polynomials in Q[x]. 

(a) x 2 + 2345;c + 125 (b) jc 3 + 5x 2 + lOx + 5 (c) jc 3 + 2x 2 + 3x + 1 

(d) x 4 + 2x 3 + 2 jc 2 + 2x + 2 (e) x 4 + 2x 3 + 3 jc 2 + 2x + 1 

(f) x 4 + 2x 3 + x 2 + 2x + 1 (g) x 5 + x 4 — 4x 3 + 2x 2 + 4x + 1 

10. Let p be a prime integer, and let / E Z[x] be a polynomial of degree 2/z + 1， say 

f(x) = a 2 n+i^ 2n+1 + + ci\X + a 0 . Suppose that a 2n +' 幸 Q (modulo p), 

flo ， “i ， … ，如三 0 (modulo p 2 ), a n + \^^ 9 a 2n = 0 (modulo p), 0 (modulo p 3 ). Prove 

that/is irreducible in Q[x]. 

11. Let p be a prime, and let ,4 + / be an n X n integer matrix such that ^ / but ^4 /. 

Prove that n ^ p - l. 

12. Determine the monic irreducible polynomials of degree 3 over F 3 . 

13. Determine the monic irreducible polynomials of degree 2 over F 5 ， 

14. Lagrange interpolation formula: 

(a) Let jco ，•，.，& be distinct complex numbers. Determine a polynomial p(x) of degree n 
which is zero at x\ y ^^x n and such that p (jc 0 ) = 1. 

(b) Let jc 。 ， … ，〜； ; y 0 ， ... ，抑 be complex numbers，and suppose that the Xi are all different. 
There is a unique polynomial g(x) E C[x] of degree < d, such that g (x/) = yi for 
each i = 0, …， d. Prove this by determining the polynomial g expiicitiy in terms of 

Xi,yi. 

*15* Use the Lagrange interpolation formula to give a method of finding all integer polyno¬ 
mial fectors of an integer polynomial in a finite number of steps. 

16. Let f(x) = x n + a n -\X n ~ x + … + a x x + be a monic polynomial with integer 
coefficients, and let r E Q be a rational root of/(x). Prove that r is an integer. 

17. Prove that the polynomial x 2 + y 2 - l is irreducible by the method of undetermined 
coefficients, that is，by studying the equation (ax + by c)(a f x + b f y + c f )= 
x 2 + y 2 — 1, where a ， b ， c ， a r ， b f ， c f are unknown. 


5. Primes in the Ring of Gauss Integers 

1* Prove that every Gauss prime divides exactly one integer prime. 

2. Factor 30 into primes in Z[/]. 

3. Factor the following into Gauss primes. 

(a) 1 - 3 / (b) 10 (c) 6 + 9/ 

4. Make a neat drawing showing the primes in the ring of Gauss integers in a reasonable 
size range. 

5. Let 77 be a Gauss prime. Prove that it and it are associate if and only if either tt is asso¬ 
ciate to an integer prime or rnr = 2, 
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6. Let/? be the ring Z[V3], Prove that a prime integer p is a prime element of R if and only 
if the polynomial x 2 — 3 is irreducible in [x ]. 

7* Describe the residue ring Z[i]/(p) in each case. 

(a) /? = 2 (b) p = l (modulo 4) (c) p = 3 (modulo 4) 

* 8 . Let/? — Z[(], where ( = 士 （一 1 + V-3) is a complex cube root of 1 . Let p be an integer 
prime 竽 3, Adapt the proof of Theorem (5.1) to prove the following, 

(a) The polynomial x 2 + x + 1 has a root in if and only if p = l (modulo 3)_ 

(b) (p) is a prime ideal of R if and only if p = -l (modulo 3). 

(c) p factors in R if and only if it can be written in the form p = a 2 + ab + b 2 , for 
some integers a ， b. 

(d) Make a drawing showing the primes of absolute value < 10 in R. 

6. Algebraic Integers 

1 . Is 5(1 + V3) an algebraic integer? 

2. Let a be an algebraic integer whose monic irreducible polynomial over Z is 
x n + a n -irt n ~ l + … + a\X + and let /? = / [a]. Prove that a is a unit in R if and 
only if a 0 — ±L 

3. Let d,d f be distinct square-free integers. Prove that Q(V^) and ©(V^ 7 ) are different 
subfields of <C. 

4. Prove that existence of factorizations is true in the ring of integers in an imaginary 
quadratic number field. 

5. Let a be the real cube root of 10, and let (3 = a + ba + ca 2 , with a, b, c, E Q. Then 

(5 is the root of a monic cubic polynomial/(x) G Q[x]. The irreducible polynomial for a 
over Q is x 3 — 10 , and its three roots are a, a f = and a ff = ^ 2 a, where 
I = e 2lTl ^. The three roots of f are (3, (3 f = a + b^a + ct^a 1 , and 

f3 ff — a + b^ 2 a + c^a 2 , so f(x) = (x - (3)(x - j3 f ){x - /3"). 

(a) Determine / by expanding this product. The terms involving a and a 1 have to cancel 
out, so they need not be computed. 

(b) Determine which elements (5 are algebraic integers. 

6. Prove Proposition (6.17). 

7 * Prove that the ring of integers in an imaginary quadratic field is a maximal subring of C 
with the property of being a lattice in the complex plane. 

8 . (a) Let S = Z[a], where a is a complex root of a monic polynomial of degree 2. Prove 

that 5 is a lattice in the complex plane. 

(b) Prove the converse: A subring S of C which is a lattice has the form given in (a). 

9 . Let R be the ring of integers in the field Q[V^]- 

(a) Determine the elements a E /? such that R — Z[a], 

(b) Prove that if R — / [a] and if a is a root of the polynomial x 2 + bx + c over Q ， 
then the discriminant b 2 — 4c is D (6*18). 

Z Factorization in Imaginaij Quadratic Fields 

1. Prove Proposition (7.3) by arithmetic. 

2* Prove that the elements 2 , 3 , 1 + V^5 ,1 — V-5 are irreducible elements of the ring 

Z[V^5], 
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3. Let d — -5, Determine whether or not the lattice of integer linear combinations of the 
given vectors is an ideal. 

(a) (5,1 + 5) (b) (7,1 + 5) (c) (4 - 28,2 + 28,6 + 48) 

4. Let A be an ideal of the ring of integers R in an imaginary quadratic field. Prove that 
there is a lattice basis for A one of whose elements is a positive integer. 

5 . Let R = Z[V^5]. Prove that the lattice spanned by (3,1 + V^5) is an ideal in R, de¬ 
termine its nonzero element of minimal absolute value，and verify that this ideal has the 
form (7.9)，Case 2. 

6 * With the notation of (7*9), show that if a is an element of R such that {{a + a8) is also 
in R, then {a, {{a + a8)) is a lattice basis of an ideal. 

7. For each ring/? listed below，use the method of Proposition (7.9) to describe the ideals in 
R. Make a drawing showing the possible shapes of the lattices in each case. 

(a) R = /[V^] (b) R = Z[i(l + V^)] (c) R = Z[V^6] (d) R = Z[V^7] 

(e) R = Z[^(l + V^7)] (f) R = 

8 . Prove that R is not a unique factorization domain when d 三 2 (modulo 4) and d < -2. 

9. Let d < -3. Prove that 2 is not a prime element in the ring Z[Vrf], but that 2 is irre¬ 

ducible in this ring. 

8. Ideal Factorization 

1. Let R = Z[V^6]. Factor the ideal (6) into prime ideals explicitly. 

2. Let 8 = V~3 and/? = Z[5]. (This is not the ring of integers in the imaginary quadratic 

number field Q[5]*) Let A be the ideal (2,1 + 5), Show that AA is not a principal ideal, 
hence that the Main Lemma is not true for this ring. 

3. Let R — Z[V^5]. Determine whether or not 11 is an irreducible element of R and 
whether or not (11) is a prime ideal in R. 

4. Let R = Z[V-6]. Find a lattice basis for the product ideal AB, where A = (2,5) and 
B = (3,5). 

5. Prove that A D A f implies that AB D A f B. 

6 * Factor the principal ideal (14) into prime ideals explicitly in R = Z[5], where 

8 — V-5. 

7. Let P be a prime ideal of an integral domain R, and assume that existence of factoriza¬ 
tions is true in R. Prove that if a S P then some irreducible factor of a is in P. 


9. The Relation Between Prime Ideals of R and Prime 
Integers 


1. Find lattice bases for the prime divisors of 2 and 3 in the ring of integers in (a) <Q[V~14] 
and (b) Q[V-23]. 

2. Let d = -14. For each of the following primes p, determine whether or not p splits or 
ramifies in R, and if so, determine a lattice basis for a prime ideal factor of (p): 
2,3,5,7, 11 ， 13. 

3. (a) Suppose that a prime integer p remains prime in R. Prove that R/(p) is then a field 

with p 1 2 3 elements. 

(b) Prove that if p splits in R, then Rf ip) is isomorphic to the product ring x F p . 
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4. Let p be a prime which splits in R, say (p) = PP ，and let a E P be any element which is 
not divisible by p. Prove that P is generated as an ideal by (p， a). 

5. Prove Proposition (9,3b). 

6. If J ^ 2 or 3 (modulo 4)，then according to Proposition (9.3a) a prime integer p remains 
prime in the ring of integers of Q[\/d] if the polynomial x 2 - d is irreducible modulo p . 

(a) Prove the same thing when d ^ \ (modulo 4) and p ^ 2. 

(b) What happens to p = 2 in this case? 

7. Assume that d e 2 or 3 (modulo 4), Prove that a prime integer p ramifies in R if and 
only if p = 2 or p divides d. 

8. State and prove an analogue of problem 7 when d is congruent 1 modulo 4. 

9. Let p be an integer prime which ramifies in R, and say that (p) = P 2 . Find an explicit 
lattice basis for P. In which cases is P a principal ideal? 

10. A prime integer might be of the form a 2 + b 2 d, with a, b E Z. Discuss carefully how 
this is related to the prime factorization of (p) in R. 

*11* Prove Proposition (9.1). 

10. Ideal Classes in Imaginary Quadratic Fields 

1. Prove that the ideals A and A f are similar if and only if there is a nonzero ideal C such 
that AC and A f C are principal ideals. 

2. The estimate of Corollary (10.12) can be improved to | a | 2 ^ 2A(L)/V3, by studying 
lattice points in a circle rather than in an arbitrary centrally symmetric convex set. Work 
this out. 

3. Let R = Z[5]，where 8 2 - -6. 

(a) Prove that the lattices P = (2,5) and Q = (3,5) are prime ideals of R. 

(b) Factor the principal ideal (6) into prime ideals explicitly in R. 

(c) Prove that the ideal classes of P and Q are equal. 

(d) The Minkowski bound for R is \ji] — 3. Using this feet, determine the ideal class 
group of R. 

4. In each case, determine the ideal class group and draw the possible shapes of the lattices. 
(a) d = -10 (b) d = -13 (c) d = -14 (d) d - -15 (e) d = -17 

(f) d = -21 

5. Prove that the values of d listed in Theorem (7.7) have unique factorization• 

6. Prove Lemma (10.13). 

7. Derive Corollary (10.14) from Lemma (10.13). 

8. Verify Table (10.24). 

U. Real Quadratic Fields 

1. Let R = Z[5], 8 = V2. Define a size function on R using the lattice embedding (11.2): 
cr{a + b8) = a 2 — 2b 2 . Prove that this size function makes R into a Euclidean domain, 

2. Let R be the ring of integers in a real quadratic number field，with = 2 or 3 (mod- 
ulo 4)* According to (6,14), /? has the form Z[x]/(x 2 — d). We can also consider the 
ring R 7 = R[x]/(x 2 — d )，which contains /? as a subring. 

(a) Show that the elements of R f are in bijective correspondence with points of U 2 in 
such a way that the elements of R correspond to lattice points. 
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(b) Determine the group of units of R f • Show that the subset U f of R f consisting of the 
points on the two hyperbolas jcy = ±1 forms a subgroup of the group of units. 

(c) Show that the group of units [/ of /? is a discrete subgroup of U f , and show that the 
subgroup Uo of units which are in the first quadrant is an infinite cyclic group. 

(d) What are the possible structures of the group of units U1 

3. Let Uo denote the group of units of R which are in the first quadrant in the embedding 

(11.2). Find a generator for Uo when (a) d = 3, (b) d = 5. 

4. Prove that if rf is a square >1 then the equation x 1 2 — y 2 d = 1 has no solution except 
x = ±l,y = 0. 

5. Draw a figure showing the hyperbolas and the units in a reasonable size range for d = 3, 

12. Some Diophantine Equations 

1. Determine the primes such that x 2 + 5y 2 = 2p has a solution. 

2. Express the assertion of Theorem (12.10) in terms of congruence modulo 20. 

3. Prove that if jc 2 三一 5 (modulo p) has a solution, then there is an integer point on one of 
the two ellipses x 2 + 5y 2 — p or lx 2 + 2xy + 3y 2 = p. 

4. Determine the conditions on the integers a y b, c such that the linear Diophantine equation 
ax + by = c has an integer solution, and if it does have one, find all the solutions. 

5. Determine the primes p such that the equation x 2 + 2y 2 = p has an integer solution. 

6. Determine the primes p such that the equation x 2 + xy + y 2 = p has an integer solu¬ 
tion. 

7. Prove that if the congruence x 2 = 一 10 (modulo p) has a solution, then the equation 
x 2 + I0y 2 = p 2 has an integer solution. Generalize. 

8. Find all integer solutions of the equation x 2 + 2 — y 3 * 5 6 7 . 

9. Solve the following Diophantine equations. 

(a) y 2 + 10 = x 3 (b) y 2 + l — x 3 (c) y 2 + 2 = x 3 



1. Prove that there are infinitely many primes congruent 1 modulo 4. 

2* Prove that there are infinitely many primes congruent to -1 (modulo 6) by studying the 
factorization of the integer p x p 2 …— 1， where pi ，…， are the first r primes. 

3. Prove that there are infinitely many primes congruent to -1 (modulo 4), 

4* (a) Determine the prime ideals of the polynomial ring C[x,y] in two variables. 

(b) Show that unique factorization of ideals does not hold in the ring C[jc ， y]. 

5. Relate proper factorizations of elements in an integral domain to proper factorizations of 
principal ideals. Using this relation，state and prove unique factorization of ideals in a 
principal ideal domain. 

6. Let /? be a domain，and let / be an ideal which is a product of distinct maximal ideals in 
two ways, say I = Pr " P r = Qi ^Q s - Prove that the two factorizations are the same, 
except for the ordering of the terms. 

7. Let be a ring containing Z as a subring. Prove that if integers m, n are contained in a 
proper ideal of R, then they have a common integer factor > L 
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*8. (a) Let 0 be an element of the group K+/Z+. Use the Pigeonhole Principle [Appendix 
(1.6)] to prove that for every integer n there is an integer b < n such that 
I I < l/bn. 

(b) Show that for every real number r and every e > 0, there is a fraction m/n such that 
\r — m/n\ ^ e/n. 

(c) Extend this result to the complex numbers by showing that for every complex num¬ 
ber a and every real number e > 0, there is an element of Z(i), say j3 = (a + bi)/n 
with a ， b，n E Z ，such that \a — (3 \ ^ e/n. 

(d) Let e be a positive real number，and for each element (i = (a + bi)/n of Q(i), 
a ， b，n E Z ， consider the disc of radius e/n about . Prove that the interiors of these 
discs cover the complex plane. 

(e) Extend the method of Proposition (7.9) to prove the finiteness of the class number 
for any imaginary quadratic field, 

*9. (a) Let R be the ring of functions which are polynomials in cos t and sin f，with real 
coefficients. Prove that R — [R[jc, y]/(jc 2 + y 2 — 1). 

(b) Prove that R is not a unique factorization domain. 

*(c) Prove that C[x,y]/(x 2 + 少 2 — 1) is a principal ideal domain and hence a unique 
factorization domain. 

*10. In the definition of a Euclidean domain，the size function cr is assumed to have as range 
the set of nonnegative integers. We could generalize this by allowing the range to be 
some other ordered set. Consider the product ring R = C[jc] X C[y]. Show that we can 

define a size function R — {0} - >S, where S is the ordered set 

{0,1 ， 2, 3,… ；出，出 + l,o) + 2 , 0 ) + 3，...}， so that the division algorithm holds. 

*11. Let <p: C[jc,y] - >C[t] be a homomorphism, defined say by 

Prove that if jc(f) and y(t) are not both constant, then ker <p is a nonzero principal ideal. 
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Modules 


Be wise! Generalize! 
Piccayune Sentinel 


h THE DEFINITION OF A MODULE 


Let R be a commutative ring. An R-module V is an abelian group with law of com¬ 
position written +， together with a scalar multiplication R xV - >V, written 

厂， 尸 t；，which satisfies these axioms; 


(1.1) (i) 


Iv = 


(ii) (rs)u = r(sv), 

(iii) (r + s)v = rv sv, 

(iv) r(v + v f ) = rv + rv\ 

for all r, s E R and v,v f ^ V. Notice that these are precisely the axioms for a vec¬ 
tor space. An F-module is just an F-vector space, when F is a field，So modules are 
the natural generalizations of vector spaces to rings. But the fact that elements of a 
ring needn’t be invertible makes modules more complicated. 

The most obvious examples are the modules R n of R-vectors ， that is, row or 
column vectors with entries in the ring，The laws of composition for /^-vectors are 
the same as for vectors with entries in a field: 


一 “r 
_ 

_ 

+ 

—hr 
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~ai+br 
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• 
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~a x ~ 
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The modules thus defined are called free modules• But when R is not a field，it is no 
longer true that these are the only modules. There will be modules which are not 
isomorphic to any free module ? though they are spanned by a finite set. 

Let us examine the concept of module in the case that R is the ring of integers 
Z. Any abelian group V, its law of composition written additively，can be made into 
a module over Z in exactly one way，by the rules 

nv = v + + i; = “n times u” 

and (-n)v = - (nv) , for any positive integer n. These rules are forced on us by ax¬ 
ioms (1.1)，starting with It; = v, and they do make V into a Z-module; in other 
words，the axioms (LI) hold. This is intuitively very plausible. To make a formal 
proof，we would go back to Peano’s axioms. Conversely, any Z-module has the 
structure of an abelian group, given by forgetting about its scalar multiplication. 
Thus 

(1.2) abelian group and Z-module are equivalent concepts, 

We must use additive notation in the abelian group in order to make this correspon¬ 
dence seem natural. 

The ring of integers provides us with examples to show that modules over a 
ring need not be free. No finite abelian group except the zero group is isomorphic to 
a free module Z rt ，because is infinite if « > 0 and Z 0 = 0. 

The remainder of this section extends some of our basic terminology to mod¬ 
ules. A submodule of an module V is a nonempty subset which is closed under ad¬ 
dition and scalar multiplication. We have seen submodules in one case before, 
namely ideals. 

(L3) Proposition • The submodules of the /?-module R x are the ideals of R. 

Proof • By definition，an ideal is a subset of R which is closed under addition 
and under multiplication by elements of R. □ 

The definition of homomorphism of /?-modules copies that of linear transforma¬ 
tion of vector spaces. A homomorphism <p\ V - >W of modules is a map which 

is compatible with the laws of composition 

(1.4) (p(v + v') = (p(v) + (p(v f ) and <p(rv) = r<p (v ), 

for all v, v f E V and r E ： R. A bijective homomorphism is called an isomorphism. 

The kernel of a homomorphism <p: V - > W is a submodule of V， and the image of 

is a submodule of W. 

The proof given for vector spaces [Chapter 4 (2.1)] shows that every homo¬ 
morphism <p: R m - >R n of free modules is left multiplication by a matrix whose 

entries are in R. 
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We also need to extend the concept of quotient group to modules. Let /? be a 
ring, and let W be a submodule of an 7?-module V. The quotient V/W is the additive 
group of cosets [Chapter 2 (9.5)] v — v + W. It is made into an /?-module by the 
rule 

(1.5) rv = 7v. 

We have made such constructions several times before. The facts we will need are 
collected together below. 


(1.6) Proposition. 


(a) The rule (1.5) is well-defined, and it makes V = V/W into an /?-module, 

(b) The canonical map tt: V - ^ V sending is a surjective homomor¬ 

phism of /?-modules, and its kernel is W. 

(c) Mapping property: Let/: V - > V be a homomorphism of /?-modules whose 

kernel contains W. There is a unique homomorphism: /; V - > V f such that 

f = h. _ _ 

(d) First Isomorphism Theorem: If ker/ = W, then / is an isomorphism from V to 
the image off. 

(e) Correspondence Theorem: There is a bijective correspondence between sub- 
modules S of V and submodules S of V which contain W, defined by 
S = tt~ 1 (S) and S = 7r(S). If S and S are corresponding modules, then V/S is 
isomorphic to V/S. 


We already know the analogous facts for groups and normal subgroups. All that re¬ 
mains to be checked in each part is that scalar multiplication is well-defined, satisfies 
the axioms for a module, and is compatible with the maps. These verifications fol¬ 
low the pattern set previously. □ 


MATRICES, FREE MODULES, AND BASES 


Matrices with entries in a ring can be manipulated in the same way as matrices with 
entries in a field. That is, the operations of matrix addition and multiplication are 
defined as in Chapter 1, and they satisfy similar rules. A matrix with entries in a 
ring R is often called an R-matrix. 

Let us ask which /?-matrices are invertible. The determinant of mnXnR- 


matrix A = (atj) can be computed by any of the old rules. It is convenient to use the 
complete expansion [Chapter 1 (4J2)]，because it exhibits the determinant as a 
polynomial in the n 2 matrix entries. So we write 


(2J) 


det A = 2 ± fliMD 


Gnp(n) ， 
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the sum being over all permutations of the set {1, … ， n}，and the symbol 土 standing 
for the sign of the permutation. Evaluating this formula on an /^-matrix, we obtain 
an element of R. The usual rules for determinant apply, in particular 

det AB = (det A)(det s). 

We have proved this rule when the matrix entries are in a field [Chapter 1 (3.16 )]， 
and we will discuss the reason that such formulas carry over to rings in the next sec¬ 
tion. Let us assume for now that they do carry over. 

If A has a multiplicative inverse A— 1 with entries in R, then 

(det A) (det A~ x ) = det / = 1. 

This shows that the determinant of an invertible /?-matrix is a unit of the ring. Con¬ 
versely, let A be an R -matrix whose determinant 6 is a unit. Then we can find its in¬ 
verse by Cramer’s Rule: 6/ = A(adj A), where the adjoint matrix is calculated from 
A by taking determinants of minors [Chapter 1 (5.4)]. This rule also holds in any 
ring. So if 6 is a unit, we can solve for A~ l in R as 

A~ l = 6 _1 (adj A). 

(2,2) Corollary. The invertible nX n matrices A with entries in R are those ma¬ 
trices whose determinant is a unit. They form a group 

GL n (R) = {invertible nxn /?-matrices }， 

called the general linear group over R. □ 


The fact that the determinant of an invertible matrix must be a unit is a strong 
condition on the matrix when R has few units. For instance, if R is the ring of in¬ 
tegers, the determinant must be ±L Most integer matrices are invertible real ma¬ 
trices, so they are in GL n (U). But unless the determinant ±1， the entries of the in¬ 
verse matrix won’t be integers, so the inverses will not be in GL rt (Z). Nevertheless, 
there are always reasonably many invertible matrices if n > 1 ， because the elemen¬ 
tary matrices 



/ + aeij = 


a 

， i 丰 j，a E R, 

1 _ 


have determinant 1. These matrices generate a good-sized group. The other elemen¬ 
tary matrices, the transposition matrices and the matrices 

■ ― 

1 

4 

•u , u a unit in /?， 


are also invertible 


* 
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We now return to the discussion of modules over a ring R. The concepts of ba¬ 
sis and independence (Chapter 3 ， Section 3) can be carried over from vector spaces 
to modules without change: An ordered set ( 仂， ." ，极 ) of elements of a module V is 
said to generate (or span) V if every v G V is a linear combination; 

(2.3) v = nvi + … + rkVk, with n E R. 

In that case the elements Vi are called generators. A module V is said to be finitely 
generated if there exists a finite set of generators. Most of the modules we study will 
be finitely generated. A Z-module V is finitely generated if and only if it is a finitely 
generated abelian group in the sense of Chapter 6, Section 8. 

We saw in Section 1 that modules needn't be isomorphic to any of the modules 
R k . However, a given module may happen to be, and if so, it is called a free module 
too. Thus a finitely generated module V is free if there is an isomorphism 

(p: R n —^V. 

For instance, lattices in IR 2 are free /-modules, whereas finite，nonzero abelian 
groups are not free. A free Z-module is also called a free abelian group • Free mod¬ 
ules form an important and natural class, and we will study them first. We will study 
general modules beginning in Section 5. 

Following the definitions for vector spaces, we call a set of elements 
of a module V independent if no nontrivial linear combination is zero, 
that is, if the following condition holds: 

(2.4) + … + r n v n = 0, with n E R，then n = 0 for i = 1，•••，《. 


The set is a basis if it is both independent and a generating set. The standard basis 
E = (^ ， … ， q) is a basis of R k ，Exactly as with vector spaces，is a basis 
if every v E 1/ is a linear combination (2.3) in a unique way. 

We may also speak of linear combinations and linear independence of infinite 
sets, using the terminology of Chapter 3, Section 5. 

Let us denote the ordered set (t>i ， … ，仏 ） by B，as in Chapter 3, Section 3, Then 
multiplication by B ， 


X\ 


BX = (Ui ， … ， U/i). 

_X/T_ 


= V\X\ + + VnX n , 


defines a homomorphism of modules 
(2.5) i±: R n ― 

This homomorphism is surjective if and only if the set (Ui， …， uj generates V, and 
injective if and only if it is independent. Thus it is bijective if and only if B is a basis 
of V， in which case V is a free module. So a module V has a basis if and only if it is 
free. Most modules have no bases. 
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Computation with bases of free /?-modules can be done in much the same way 
as with bases of vector spaces, using matrices with entries in In particular, we 
can speak of the coordinate vector of an element v E ： V, with respect to a basis 
B = (ui ，•••， u rt ). It is the unique column vector X E ： R n such that 


v = BX = v\X\ + ••_ + v n x n . 

If two bases B = (仍 ，…， and B' = (v[ 9 ...,v f r ) for the same free module V 
are given, then the matrix of change of basis is obtained as in Chapter 3, Section 4 
by writing the elements vj of the first basis as linear combinations of the second ba¬ 
sis: B = or 


( 2 . 6 ) 


=I • 


As with vector spaces, any two bases of the same free module over a nonzero 
ring have the same cardinality, provided that R is not the zero ring. Thus n — r in 
the above bases, This can be proved by considering the inverse matrix Q — (qij) 
which is obtained by writing B' in terms of B: B' = BQ. Then 

B = B'P = BQP. 

Since B is a basis，there is only one way to write vj as a linear combination of 

and that is vj = lty, or B = B/. Therefore QP = I ， and similarly 
PQ — /: The matrix of change of basis is an invertible /^-matrix. 

Now P is an r x n matrix, and Q is an x r matrix. Suppose that r > n. Then 
we make P and Q square by adding zeros: 

斗 / 

0 J 

This does not change the product PQ. But the determinants of these square matrices 
are zero，so they are not invertible, because R ^ 0. This shows that r = n, as 
claimed. 

It is a startling fact that there exist noneommutative rings R for which the mod¬ 
ules R n for n = 1 ， 2, 3, … are all isomorphic (see miscellaneous exercise 6). Deter¬ 
minants do not work well unless the matrix entries commute. 

Unfortunately, most concepts relating to vector spaces have different names 
when used for modules over rings，and it is too late to change them* The number of 
elements of a basis for a free module V is called the rank of V, instead of the dimen¬ 
sion. 

As we have already remarked, every homomorphism <p: R n - >R m between 

column vectors is left multiplication by a matrix A. If <p: V - > W is a homomor¬ 

phism of free modules with bases B = ( 仍， … ， u ； 2 ) and C = (m ， Wm) respec¬ 
tively, then the matrix of the homomorphism is defined to be A = ( 印 y)，where 

(2.7) <p(vj) = E maij 

i 


P 0 
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as before [Chapter 4 (2.3)]_ A change of the bases B，C by invertible -matrices P, Q 
changes the matrix of (p to A r — QAP' 1 [Chapter 4 (2.7)]. 

3. THE PRINCIPLE OF PERMANENCE OF IDENTITIES 

In this section，we address the following question: Why do the properties of ma¬ 
trices with entries in a field continue to hold when the entries are in an arbitrary 
ring? Briefly，the reason is that they are identities ， which means that they hold when 
the matrix entries are replaced by variables. To be more precise, assume we want to 
prove some identity such as the multiplicative property of the determinant ， 
(detA)(detfi) = det(A6)，or Cramer’s Rule. Suppose that we have already checked 
the identity for matrices with complex entries. We don’t want to do the work again ， 
and anyhow we may have used special properties of C s such as the field axioms, the 
fact that every complex polynomial has a root, or the fact that C has characteristic 
zero, to check the identity there. We did use special properties to prove the identi¬ 
ties mentioned, so the proofs we gave will not work for rings. We are now going to 
show how to deduce such identities for all rings from the same identities for the 
complex numbers. 

The principle is very general, but in order to focus attention，let us concentrate 
on the identity (det A)(det B) = det(A8). We begin by replacing the matrix entries 
with variables. So we consider the same identity 

(det X)(det Y) - det(xr )， 

where X and Y denote nX n matrices with variable entries * Then we can substitute 
elements in any ring R for these variables* Formally，the substitution is defined in 
terms of the ring of integer polynomials Z[{xij\, {yke}] in 2n 2 variable matrix entries. 
There is a unique homomorphism from the ring of integers to any ring R [Chapter 
10 (3.9)]. Given matrices A = (aij) , B = (bke) with entries in there is a homomor¬ 
phism 

(3.1) Zllxij}, {y ke }] — >R, 

the substitution homomorphism, which sends xij and yu bn [Chapter 10 
(3.4)]. Our variable matrices have entries in the polynomial ring, and it is natural to 

say that the homomorphism sends and meaning that the entries of 

X = (xij) are mapped to the entries of A = (a") and so on, by the map. 

The general principle we have in mind is this: Suppose we want to prove an 

identity, all of whose terms are polynomials with integer coefficients in the matrix 
entries. Then the terms are compatible with ring homomorphisms : For example, if a 

homomorphism cp: R - >R f sends and , then it sends 

det A^^det A\ To see this, note that the complete expansion of the determinant is 

det A = ^ — d\p(\) ⑻， 

p 
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the summation being over all permutations p. Since 9 is a homomorphism, 
ip (det a) — X — ^(^lpd) a np(n)) = X — dnp{n) — det A f . 

p 

Obviously, this is a general principle. Consequently, if our identity holds for the 
/^-matrices A, B, then it also holds for the '-matrices a\b\ 

Now for every pair of matrices A ， fi, we have the homomorphism (3.1) which 
sends and We substitute Z [{jcy}, {for R and R for R r in the 

principle just described. We conclude that if the identity holds for the variable ma¬ 
trices X, Y in then it holds for every pair of matrices in any ring R: 

(3,2) To prove our identity in general，we need only prove it 

for the variable matrices X，Y in the ring Z[{xij}, 

To prove it for variable matrices, we consider the ring of integers as a subring 
of the field of complex numbers ， noting the inclusion of polynomial rings 

We may as well check our identity in the bigger ring. Now by hypothesis, our iden¬ 
tity is equivalent to the equality of certain polynomials in the variables {y"}, ••… 
Let us write the identity as/(jc,y,jw) = 0. The symbol /may stand for several poly¬ 
nomials. 

We now consider the polynomial function corresponding to the polynomial 
f{xij , yu), call it If the identity has been proved for all complex matrices, 
then it follows that f(xij, yki) is the zero function. We apply the fact [Chapter 10 
(3.8)] that a polynomial is determined by the function it defines to conclude that 
f(xij ? ytj) = 0 , and we are done. 

It is possible to formalize the above discussion and to prove a precise theorem 
concerning the validity of identities in an arbitrary ring. However, even mathemati¬ 
cians occasionally feel that it isn’t worthwhile making a precise formulation — that it 
is easier to consider each case as it comes along. This is one of those occasions• 

4L DIAGONALIZATION OF INTEGER MATRICES 

In this section we discuss simplification of an m X n integer matrix A = (aij) by a 
succession of elementary operations. We will apply this procedure later to classify 
abelian groups. The same method will work for matrices with entries in a Euclidean 
domain and，with some modification，for matrices with entries in a principal ideal 
domain* 

The best results are obtained if we allow both row and column operations to¬ 
gether. So we allow these operations: 
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(4.1) 

(i) add an integer multiple of one row to another，or add an integer multiple of 
one column to another; 

(ii) interchange two rows or two columns; 

(iii) multiply a row or a column by a unit. 


Of course，the units in Z are ±1. Any such operation can be made by multiplying A 
on the left or right by a suitable elementary integer matrix. The result of a sequence 
of these operations will have the form 

(4.2) A f = QAP\ 

where Q G GL m (Z) and P~ } E GL n (Z) are products of elementary integer matrices. 
Needless to say, we could drop the inverse symbol from P. We put it there because 
we will want to interpret the operation as a change of basis. 

Over a field, any matrix can be brought into the block form 

, "/ ' 

A f = 


by such operations [Chapter 4 (2.9)]. We can not hope for such a result when work¬ 
ing with integers. We can’t even do it for 1 x 1 matrices. But we can diagonalize: 

(4.3) Theorem* Let Abe an mX n integer matrix. There exist products Q,P of el¬ 
ementary integer matrices as above，so that A f — QAP^ 1 is diagonal: 

fUll 


L d r \ 

0 

■ — 

where the diagonal entries ch are nonnegative and where each diagonal entry divides 
the next: d\ \d 2 , d 2 \d 3 ,. t .. 


Proof. The strategy is to perform a sequence of operations so as to end up with 
a matrix 


(4.4) 


d\ 0 … 0 

0 一 " 

: B 

0 - - 


in which d\ divides every entry of B. When this is done, we work on B. The process 
is based on repeated division with remainder. We will describe a systematic method, 
though using this method is usually not the quickest way to proceed. 

We may assume A 关 0. 
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Step 1: By permuting rows and columns, move a nonzero entry with smallest abso¬ 
lute value to the upper left corner. Multiply the first row by -1 if necessary, so that 
this upper left entry a n becomes positive. 

We now try to clear out the first row and column• Whenever an operation pro¬ 
duces a nonzero entry in the matrix whose absolute value is smaller than \au |, we 
go back to Step 1 and start the whole process over. This is likely to spoil the work we 
have done to clear out matrix entries * However，progress is being made because the 
size of an is reduced every time. We will not have to return to Step 1 infinitely 
often. 

Step 2: Choose a nonzero entry an in the first column, with / > 1， and divide by 
a n : 


at\ = a u q + r, 

where 0 < r < a u . Subtract q times (row 1) from (row /). This changes an to r. 

If r 关 0, we go back to Step L If r = 0, we have produced a zero in the first 
column. Finitely many repetitions of Steps 1 and 2 result in a matrix in which an = 
0 for all i > 1. Similarly, we may use the analogue of Step 2 for column operations 
to clear out the first row, eventually ending up with a matrix in which the only 
nonzero entry in the first row and column is an, as required by (43)* However, an 
may not yet divide every entry of the matrix B (4.4). 

Step 3: Assume that a u is the only nonzero entry in the first row and column，but 
that some entry l? of B is not divisible by an. Add the column of A which contains b 
to column 1. This produces an entry b in the first column. 

We go back to Step 2. Division with remainder will now produce a smaller ma¬ 
trix entry, sending us back to Step 1. A finite sequence of these steps will produce a 
matrix of the form (4.4), allowing us to proceed by induction. □ 


(4.5) Example. We do not follow the systematic method: 


A 


'2 -1" 

column 

"l -l" 

column 

乂 

— — 

l 

row 

— _ 

l 

.1 2_ 

oper 

_3 2. 

oper 

3 5 
_ ■ 

oper 

5 

__ ■ 




Here 


Q 


-3 


and P 


-i 


2 


Note that the key ingredient in this proof is the division algorithm. The same 
proof will work when Z is replaced by any Euclidean domain. 


(4*6) Theorem. Let Rhea Euclidean domain，for instance a polynomial ring F[t] 
in one variable over a field. Let A be an mXn matrix with entries in R. There are 
products Q,P of elementary R -matrices such that A f = QAP~ l is diagonal and such 
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that each diagonal entry of A f divides the next: 4 | 必 | 必 | …， If R = F[/], we can 
normalize by requiring the polynomials di to be monic. □ 

(4.7) Example. Diagonalization of a matrix of polynomials: 



"/ 2 -3r+2 t-2 " 

row 

卜 2 

row 

乂 

~1 t~2 

column 

— (卜 l) 3 t 2 -3t+2^ 

oper 

. (卜 i) 2 o 一 

oper 

j-i) 2 0 _ 

- ， 

oper 


" -l 

t~2 

column 

■ -1 0 

row 

乂 

「 i 

l 

j-i) 2 

0 _ 

oper 

_ (卜 1)2 (卜 1)2( 卜 2)_ 

oper 

.(/-l) 2 (/-2)^ 




In both examples ? we ended up with 1 in the upper left corner. This isn’t sur¬ 
prising. The matrix entries will often have greatest common divisor 1. 

The diagonalization of integer matrices can be used to describe homomor- 
phisms between free abelian groups. As we have already remarked (2.8)，a homo¬ 
morphism <p: V — of free abeJian groups is described by a matrix ? once bases 
for V and W are chosen. A change of bases in V， W by invertible integer matrices 
P y Q changes A to A r = QAP~\ So we have proved the following theorem: 


(4.8) Theorem. Let <p: V - >W be a homomorphism of free abelian groups. 

There exist bases of V and W such that the matrix of the homomorphism has the di¬ 
agonal form (4,3). □ 


In the rest of this section, we will investigate the meaning of this theorem for two 
auxiliary groups associated to a homomorphism: its kernel and its image. 

Let <p: Z n - >Z m be left multiplication by the w x « integer matrix A. The 

kernel of ip is the subgroup of Z n of integer solutions of the system of linear equa¬ 
tions 

( 4 . 9 ) ax = 0. 

These solutions can be read off immediately when the matrix is diagonal: In order 
for X to solve the diagonal system d\Xi = 0^..,d n x n — 0, we must have xt = 0 un¬ 
less di = 0 ? and if di = 0, then Xi can be arbitrary. 

To solve (4.9) in general, we may diagonalize A, say to A f — QAP'\ where 
Q,P are products of elementary integer matrices. We make the change of variable 
X r = PX and solve the diagonal system 

AY = QAP^ ] X f = 0* 

Since Q is invertible, the system of equations QAX = 0 has the same solutions as the 
system AX = 0, So the solutions of the original system are X = P^ l X f . 

Next, let us examine the image of <p: Z n - > Z m ， the map defined by multipli¬ 

cation by the integer matrix A as before. We can describe this image as the set of 
vectors B E Z m such that the system of integer equations AX = B has an integer so¬ 
lution. We will often denote this image by AZ n . Multiplication by A sends the basis 
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vectors e\,^^e n G Z n to the columns 

—p 

Cl\n 

筆 

9 

* 

dmn 

of A, so the image is the set of integer linear combinations of these columns. In other 
words，the columns generate the image. 

We can turn this description around, starting with an arbitrary subgroup S of 
the free abelian group Z m which is given to us explicitly by a set of generators 
Ai，•••，G Z m . Let A be the matrix whose columns are A/, Then S is the image of 
left multiplication by A. This interpretation of S as the image of a homomorphism 
tells us the meaning of left and right multiplication by invertible integer matrices Q 
and P~ l : Left multiplication by Q corresponds to a change of basis in the module Z m ， 
the range of the map. Its effect is to multiply each of the generators A t by Q. On the 
other hand, right multiplication by P_ l represents a change of basis in the domain 
Z n . This changes the generating set of S. For example, adding r times column 1 to 
column 2 changes A 2 to A 2 f = A 2 + rA\ and leaves the other generators unchanged. 
Combining these observations with diagonalization results in the following theorem: 

(4.11) Theorem* Let 5 be a subgroup of a free abelian group W of rank m. There 
is a basis (wi ，…， w m ) of W and a basis («i ，…， u n ) of S with the following properties: 
(i) n < m, (ii) for each 7 < « there is a positive integer dj such that uj = dpyj, and 

(iii) d l \d 2 \d 3 ..^ 

(4.12) Corollary* Every subgroup of a free abelian group of rank m is free，and 
its rank is at most m, □ 

Proof of Theorem (4.11). Roughly speaking, we need only choose a basis 
B — (wi ，•..， w m ) for W and a set of generators (u“".Un) for 5, to obtain an mx n 
matrix A which represents S as above. The diagonalization theorem gives us a diago¬ 
nal matrix A' = QAP^ 1 representing S with respect to a new basis B' = (vv/ ， •_, ， w p f ) 
and new generating set (m/ ，…， u n f ). Then u/ — djw/ • We drop the primes to obtain 
the basis and generating set required. This completes the proof except for three 
points. 

First, we may have n > m ，that is, there may be more columns than rows. But 
if so, then since A f is diagonal, its jth column is zero for each j > m\ hence the cor¬ 
responding generator uj is zero too. The zero element is useless as a generator, so we 
throw it out. For the same reason, we may throw out a generator uj whenever 
4 = 0 . After we do this, all dj will be positive, and we will have n < m. 

Notice that if S is the zero subgroup，we will end up throwing out all the gen¬ 
erators .As with vector spaces, we must adopt the convention that the empty set 
generates the zero module，or else make a special mention of this exceptional case in 
the statement of the theorem. 


(4.10) 


A 】 


d\\ 

• 

虜 


* 
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Next, we verify that if the basis and generating set are chosen so that di > 0 
and n < m, then {u\ is a basis of S. Since it generates S, what has to be 
proved is that u n ) is independent. We rewrite a linear relation 

nwi + … + r n u n = 0 in the form r x d\w x + + r n d n w n = 0, Since (wi ， … ， w m ) 

is a basis, r t di = 0 for each /， and since di > 0 ? n = 0. 

The final point is more serious: We need a finite set of generators of S to get 
started. How do we know that there is such a set? It is a fact that every subgroup of 
a finitely generated abelian group is itself finitely generated. We will prove this in 
Section 5. For the moment, the theorem is proved only with the additional hypothe¬ 
sis that S is finitely generated* The hypothesis that W is finitely generated can not be 
removed, □ 


Theorem (4.11) is quite explicit. Let S be the subgroup of Z m generated by the 
columns of a matrix A, and suppose that A' = QAP~ X is diagonal. To display S in the 
form asserted in the theorem, we rewrite this equation in the form 

(4.13) Q~ l A r = AP~\ 

and we interpret it as follows: The columns of the matrix AP— 1 form our new set of 
generators for S. Since the matrix A 7 is diagonal, (4.13) tells us that the new genera¬ 
tors are multiples of the columns of Q~\ We change the basis of Z m from the stan¬ 
dard basis to the basis made up of the columns of Q~\ The matrix of this change of 
basis is Q [see Chapter 3 (4.21)]. Then the new generators are multiples of the new 
basis elements. 

For instance, let S be the lattice in U 2 generated by the two columns of the ma¬ 
trix A of Example (4.5): Then 


(4.14) 




The new basis of Z 2 is {w\ ,w 2 f ) — 

(MiW) = (u u u 2 )P~ l = (wi\5w 2 f ). 

Theorem (4.3) is striking when it is used to describe the relative position of a 
sublattice 5 in a lattice L. To illustrate this，it will be enough to consider plane lat¬ 
tices, The theorem asserts that there are bases (vu Vi) and (w \, W 2 ) of L and S such 
that the coordinate vectors of wj with respect to the basis (vi , v 2 ) are diagonal. Let 
us refer the lattice L back to Z 2 C U 2 by means of the basis (Ui ， v 2 )* Then the equa¬ 
tions Wi = diVi show that S looks like this figure, in which we have taken d\ = 2 
and d 2 = 4: 


) ， and the new generators of S are 



(4.15) Figure. 


Notice the fact，which we have asserted before [Chapter 11 (10.10)]，that the index 
[L:S] is the ratio of the areas of the parallelograms spanned by bases. This is evident 
when the bases are in such a relative position. 

In practice，when the lattices L and S are given to us in U 2 at the start，the 
change of basis required to get such “commensurable” bases of L and S leads to 
rather long and thin parallelograms, as is shown below for Example (4*14). 



(4.16) Figure. Diagonalization, applied to a sublattice. 
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5. GENERATORS AJSD RELATIONS FOR MODULES 

In this section we turn our attention to modules which are not free. We will show 
how to describe a large class of modules by means of matrices called presentation 
matrices • We will then apply the diagonalization procedure to these matrices to the 
study of abelian groups. 

As an example to keep in mind，we may consider an abelian group or 
Z-module V which is generated by three elements (v\, 仍，仍 )， We suppose that these 
generators are subject to the relations 

(5.1) 3v\ + 2x)i + U3 = 0 

8i；i + 4 i ；2 + 2 i ；3 = 0 

7ui + 6t>2 + = 0 

9v\ + 6v 2 + U 3 = 0. 

The information describing this module is summed up in the matrix 

3 8 7 9~ 

2 4 6 6, 

1 2 2 1_ 

whose columns are the coefficients of the relations (5.1): 

(vi , vi , v 3 )a = (0,0,0,0). 

As usua 】， scalars appear on the right side in this matrix product. It is this method of 
describing a module which we plan to formalize. 

If (ui ，， … ， u m ) are elements of an /^-module V, equations of the form 

(5.3) AUi + … + a m v m = 0, ai E R, 

are called relations among the elements. Of course, when we refer to (5.3) as a rela¬ 
tion, we mean that the formal expression is a relation: If we evaluate it in V, we get 
0 = 0 . Since the relation is determined by the /?-vector (au..^a m )\ we will refer to 
this vector as a relation vector ， meaning that (5.3) is true in V. By a complete set of 
relations we mean a set of relation vectors such that every relation vector is a linear 
combination of this set. It is clear that a matrix such as (5.2) will not describe the 
module V completely, unless its columns form a complete set of relations* 

The concept of a complete set of relations can be confusing. It becomes much 
clearer when we work with homomorphisms of free modules rather than directly 
with the relations or the relation vectors. Let an m x « matrix A with entries in a ring 
R be given. As we know, left multiplication by this matrix is a homomorphism of 
^modules 

( 5 . 4 ) <p: R n — >R m . 


(5.2) 
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In addition to the kernel and image, which we described in the last section when 
R = there is another important auxiliary module associated with a homomor¬ 
phism ip: W - > W F ofmodules，called its cokernel. The cokernel of <p is defined 

to be the quotient module 

(5.5) WV(im ip). 

If we denote the image of left multiplication by A by AR' the cokemel of (5.4) 
is R m /AR n . This cokemel is said to be presented by the matrix A, More generally, 
we will call any isomorphism 

(5.6) a:R m /AR n - 1 ^V 


a presentation of a module V, and we say that the matrix A is a presentation matrix 
for V if there is such an isomorphism. 

For example, the cyclic group Z/(5) is presented as a Z-module by the 1 X 1 
integer matrix [5]. As another example, let V be the Z-module presented by the ma¬ 


trix 



2 


」 


The columns of this matrix are the relation vectors, so V is generated 


by two elements V\, Vi with the relations 2v\ + Vi — —ui + 2vi — 0. We may solve 
the first relation, obtaining v 2 = ~2vu This allows us to eliminate the second gener¬ 
ator. Substitution into the second relation gives -5vi = 0. So V can also be gener¬ 
ated by a single generator V\ 5 with the single relation 5ui = 0, This shows that V is 
isomorphic to Z/(5). This 2x2 matrix also presents the cyclic group Z/(5). 

We will now describe a theoretical method of finding a presentation of a given 
module V. To carry out this method in practice, the module would have to be given 
in a very explicit way. Our first step is to choose a set of generators So V 

must be finitely generated for us to get started. These generators provide us with a 
surjective homomorphism 

(5.7) f'R m ― 

sending the column vector x = x m ) to V\X\ + … + v m x m . The elements of 

the kernel of/are the relation vectors. Let us denote this kernel by W. By the First 
Isomorphism Theorem, V is isomorphic to R m /W. 

We repeat the procedure, choosing a set of generators (w 】 ”“ ， hvi) for W, and 
we use these generators to define a surjective homomorphism 


(5.8) 


g:R n ― 


as before. Since W is a submodule of R m , composition of the homomorphism g with 
the inclusion W C R m gives us a homomorphism 

(5.9) <p ： R n — ^R m . 

This homomorphism is left multiplication by a matrix A. By construction, W is the 
image of <p ? which is AR n , so R m /AR n = R m /W ^ V. Therefore, A is a presentation 
matrix for V. 
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The columns of the matrix A are our chosen generators for the module W of re¬ 
lations: 



Gil 



Wi = 

* 

• 

Clm\ 


• 

• 

• 

dmn 


Since they generate W, these columns form a complete set of relations among the 
generators (ui ， … ， u m ) of the module V. Since the columns are relation vectors ， 

(5,10) (vu...,v m )A = 0. 

Thus the presentation matrix A for a module V is determined by 


( 5 . 11 ) 

(i) a set of generators for V, and 

(ii) a complete set of relations among these generators * 

We have let one point slip by in this description. In order to have a finite set of 
generators for the module of relations W, this module must be finitely generated. 
This does not look like a satisfactory hypothesis, because the relationship of our 
original module V with W is unclear. We don’t mind assuming that V is finitely gen¬ 
erated, but it isn’t good to impose hypotheses on a module which arises in the course 
of some auxiliary construction，We will need to examine this point more closely [see 
(5.16)]. But except for this point，we can now speak of generators and relations for a 
finitely generated /^-module V. 

Since the presentation matrix depends on the choices (5,11)，many matrices 
present the same module，or isomorphic modules. Here are some rules for manipu¬ 
lating a matrix A without changing the isomorphism class of the module it presents: 

(5.12) Proposition. Let A be an m x n presentation matrix for a module V. The 
following matrices A ; present the same module V: 

(i) A f = QAP \ where Q E GL m (R) and P E GL n {R)\ 

(ii) A f is obtained by deleting a column of zeros; 

(iii) the jth column of A is a , and A f is obtained from A by deleting the ith row and 
jth column. 

Proof. 

(i) The module R m /AR n is isomorphic to V. Since the change of A to QAP~ l corre¬ 
sponds to a change of basis in R m and R n ， the isomorphism class of the quotient 
module does not change, 

(ii) A column of zeros corresponds to the trivial relation, which can be omitted. 

(iii) Suppose that the jth column of the matrix A is a . The corresponding relation is 
Vi = 0* So it holds in the module V ， and therefore Vi can be left out of the gen- 
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erating set (v\ ， . ， _ ， u m )_ Doing so changes the matrix A by deleting the ith row 
and yth column. □ 


It may be possible to simplify a matrix quite a lot by these rules. For instance, 
our original example of the integer matrix (5.2) reduces as follows: 



Thus A presents the abelian group Z/(4). 

By definition, an m x n matrix presents a module by means of m generators 
and n relations. But as we see from this example，the number of generators and the 
number of relations depend on choices. They are not uniquely determined by the 
module. 


Consider two more examples: The 2 x 1 matrix 


4 

0 


presents an abelian group 


Vby means of two generators (ui ， v 2 ) and one relation 4v x = 0. We can not simplify 
this matrix. The group which it presents is isomorphic to the product group 
Z/(4) x Z. On the other hand，the matrix [4 0] presents a group with one genera¬ 
tor V\ and two relations，the second of which is the trivial relation. This group is 

z/ ⑷. 


We will now discuss the problem of finite generation of the module of rela¬ 
tions. For modules over a nasty ring，this module needn’t be finitely generated，even 
though V is. Fortunately this problem does not occur with the rings we have been 
studying, as we will now show. 


(5.13) Proposition. The following conditions on an module V are equivalent: 

(i) Every submodule W of V is finitely generated; 

(ii) ascending chain condition: There is no infinite strictly increasing chain 
Wi < W 2 < ... of submodules of V. 

Proof. Assume that V satisfies the ascending chain condition，and let W be a sub- 
module of V. We select a set vvi ，奶， … ， vv* of generators of W in the following way: 
IfW = 0, then W is generated by the empty set. If not，we start with a nonzero ele¬ 
ment Wi E W. To continue, assume that Wi， …， m have been chosen, and let Wi be 
the submodule generated by these elements. If Wi is a proper submodule of W, let 
w/+i be an element of W which is not contained in Wi. Then W\ < W 2 < .... Since 
V satisfies the ascending chain condition，this chain of submodules can not be con¬ 
tinued indefinitely* Therefore some Wk is equal to W. Then (Wi ， ... ， vv*) generates W. 
The converse follows the proof of Theorem (2* 10) of Chapter 11. Assume that every 
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submodule of V is finitely generated，and let Wi C C ... be an infinite increasing 
chain of submodules of V. Let U denote the union of these submodules. Then f/ is a 
submodule [see Chapter 11 (2,11)]; hence it is finitely generated. Let … be 
generators for U. Each u v is in one of the modules Wi，and since the chain is increas¬ 
ing, there is an i such that all of the generators are in Wi. Then the module U they 
generate is also in Wi, and we have U C Wi C C U. This shows that U = 
Wi = Wi-hi and that the chain is not strictly increasing, □ 

(5.14) Lemma. 

(a) Let <p\ V - > W be a homomorphism of /?-modules. If the kernel and the im¬ 

age of <p are finitely generated modules, so is V. If V is finitely generated and if 
<p is surjective，then W is finitely generated. More precisely，suppose that 
(Ui”" ， u rt ) generates V and that <p is surjective. Then gener¬ 

ates W. 

(b) Let W be a submodule of an /?-module V. If both W and V/W are finitely gen¬ 
erated, so is V. If V is finitely generated，so is V/W. 


Proof. For the first assertion of (a)，we follow the proof of the dimension for¬ 
mula for linear transformations [Chapter 4 (1.5)] ? choosing a set of generators 
(Wi ， … Uk) for ker <p and a set of generators (wi，•. • ， w m ) for im <p. We also choose ele- 
ments e V such that <p (vi) = Wi ，Then we claim that the set («i ， •. • ， w*; ui ， … ， v m ) 
generates V. Let u E V be arbitrary. Then <p(v) is a linear combination of 
(Wi ， ••• ， Wm)，say <p(v) = a\W\ + + a m w m * Let v f = a\V\ + + a m v m . Then 

<p(v f ) = (p(v). Hence v — v f S ker <p, so v — v f is a linear combination of 
(Wi ， ••• ， wjt)，say v — v r — b\U\ + + bkUk. Therefore v = aiUi + ••• + a m v m + 

b\U\ + + bkUk. This shows that the set («i ， •.. ，， ••• ， u m ) generates V, as 

required. The proof of the second assertion of (a) is easy. Part (b) follows from 
part (a) by a consideration of the canonical homomorphism tt: V - > V/W. □ 

(5.15) Definition. A ring R is called noetherian if every ideal of R is finitely 
generated. 

Principal ideal domains are obviously noetherian，so the rings Z，/[/]，and F[x] 
(F a field) are noetherian, 

(5.16) Corollary. Let R be a noetherian ring. Every proper ideal I of R is con¬ 
tained in a maximal ideal. 

Proof. If I is not maximal itself, then it is properly contained in a proper ideal 
h, and if h is not maximal, it is properly contained in a proper ideal h , and so on. 
By the ascending chain condition (5.13) ? the chain / = /1 < h < h •" must be 
finite. Therefore h is maximal for some k. □ 
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The relevance of the notion of noetherian ring to our problem is shown by the 
following proposition: 

(5.17) Proposition. Let V" be a finitely generated module over a noetherian ring R. 
Then every submodule of V is finitely generated 

Proof. It suffices to prove the proposition in the case that V = R m . For as¬ 
sume that we have proved that the submodules of R m are finitely generated, for all 
m. Let V be a finitely generated /?-module. Then there is a surjective map 

<p: R m - > V. Given a submodule S of V, let L — qT l {S). Then L is a submodule of 

the module R m ，and hence L is finitely generated. Also, the map L - >S is surjec¬ 

tive. Hence S is finitely generated (5.14). 

To prove the proposition when V = R m , we use induction on m. A submodule 
of R is the same as an ideal of R (1.3). Thus the noetherian hypothesis on R tells us 
that the proposition holds for V = R m when m = 1. Suppose m > 1. We consider 
the projection 

77 ： R m — >R m ~ x 

given by dropping the last entry: 7r(cii ， … ， a m ) = (ai,,.,,Its kernel is 

— Let W C R m be a submodule, and let (p: W - >R m ~ l be the 

restriction of tt to W. The image <p(W) is finitely generated, by induction. Also, 
ker <p = (W H ker 7 t) is a submodule of ker tt « R, so it is finitely generated too. 
By Lemma (5,14), W is finitely generated，as required. □ 

This proposition completes the proof of Theorem (4.11). 

Since principal ideal domains are noetherian, submodules of finitely generated 
modules over these rings are finitely generated. But in fact, most of the rings which 
we have been studying are noetherian. This follows from another of Hilbert^s fa¬ 
mous theorems: 

(5.18) Theorem. Hilbert Basis Theorem: If a ring R is noetherian，then so is the 
polynomial ring R[x]. 

The Hilbert Basis Theorem shows by induction that the polynomial ring R[x\,...,x n ] 
in several variables over a noetherian ring R is noetherian, hence that the rings 
Z[x } and F[x\^^,x n ] (F a field) are noetherian. Also, quotients of noethe¬ 

rian rings are noetherian: 

(5.19) Proposition. Let /? be a noetherian ring, and let / be an ideal of R. The 
quotient ring R = R/I is noetherian. 

Proof. Let J be^an ideal of R, and let J = 7r _1 (J) be the corresponding ideal of 

R ，where tt: R - >R is the canonical map* Then J is finitely generated，say by 

(Gi ， • • • ， flm)• It follows that the finite set ( 互 1 ， … ， a m ) generates 7(5.14)* □ 
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Combining this proposition with the Hilbert Basis Theorem gives the follow¬ 
ing result; 

(5.20) Corollary. Any ring which is a quotient of a polynomial ring over the in¬ 
tegers or over a field is noetherian. □ 

Proof of the Hilbert Basis Theorem. Assume that R is noetherian, and let / be 
an ideal of the polynomial ring R[x]. We must show that a finite set of polynomials 
suffices to generate this ideal. 

Let’s warm up by reviewing the case that /? is a field In that case, we may 
choose a nonzero polynomial / G / of lowest degree，say 

(5.21) f(x) = + … + + a 0 ， ❿妾 0 ， 

and prove that it generates the ideal as follows: Let 

(5*22) g(x) = + … + + 々 0 ， b m ^ 0, 

be a nonzero element of L Then the degree m of g is at least n. We use induction on 
nu The polynomial 

(5.23) gW — (bm/a n )x m ~ n f(x) = giW 

is an element of I of degree < m. By induction, g\ is divisible by/; hence g is divis¬ 
ible by /. 

Formula (5.23) is the first step in the division with remainder of g by/. The 
method does not extend directly to arbitrary rings, because division with remainder 
requires that the leading coefficient of/ be a unit. More precisely, in order to form 
the expression (5,23) we need to know that a n divides b m in the ring R, and there is 
no reason for this to be true. We will need more generators. 

Let us denote by A the set of leading coefficients of all the polynomials in /， 
together with the zero element of R. 

(5.24) Lemma. The set A of leading coefficients of the polynomials in an ideal of 
R[x], together with 0, forms an ideal of R. 

Proof. If a = a n is the leading coefficient of /， then ra is the leading 
coefficient of rf ，unless by chance m 二 0. In both cases, ra E ： A. Next, let a — a n 
be the leading coefficient of/，and let P = b m be the leading coefficient of g, where, 
say ，m > n. Then a is also the leading coefficient of x m ~ n f. Hence the coefficient of 
x m in the polynomial h = x m _ n f + g is a + /3, This is the leading coefficient of h 
unless it is zero, and in either case, a + /3 E A. □ 

We return to the proof of the Hilbert Basis Theorem. According to the lemma ， 
the set A is an ideal of the noetherian ring R, so there exists a finite set of genera¬ 
tors, say ( 的 ，…， ％)，for this ideal. We choose for each i, 1 < i < a polynomial 
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fiE：I with leading coefficient 印 ， and we multiply these polynomials by powers of x 
as necessary，so that their degrees become equal to some common integer n. 

The set of polynomials (J\ ， … ， fk) obtained in this way will allow us to adapt 
the induction step (5.23)，but it will probably not generate L We have little chance of 
finding a polynomial of degree <n in the ideal (/ ， •.. ， / 々 )_ So we must add some ele¬ 
ments of low degree to get generators for our ideal. The following lemma is easy ， 
and we omit its proof: 

(5,25) Lemma. Let P n denote the set of polynomials in R[x] which have de¬ 
gree < n, together with zero, and let 5^ = 7 Pi /V Then S n is an jR- submodule of 
the /?-module P n . 

The /^-module P n is generated by the monomials 1 ， jc ， … ， x 71-1 ，so it is finitely gener¬ 
ated, Since R is noetherian, we may use Lemma (5.25) and Proposition (5,17) to 
conclude that there is a finite set ( 怂， … ，心 ） of elements which generates S n as an R- 
module. We claim that the combined set (fu … ， … ， hs) generates L 

Denote by J the ideal generated by this set. By construction, J C L We need 
to prove the opposite inclusion，and we use induction on the degree of an element 
g E ： L We denote this degree by m. If m < n，then g E S n , and therefore ^ is a 
linear combination of (h u .. t ,h s ), with coefficients in R. So g E 7 in that case. As¬ 
sume that m 2 n, and let the leading coefficient of g be Z) — b m . Then b is in the 
ideal A of leading coefficients，so it is a linear combination of the generators of that 
ideal，say b = r } a\ + + nak- Remembering that at is the leading coefficient of 

fi, we see that the polynomial 

P = x m ' n (Z nfd 

* 

i 

has the same leading coefficient and the same degree as g, and it is in 7 . So 
g\ = g ~ p has degree less than m. By induction, g\ E 7 , and hence g E ： J. g 

6. THE STRUCTmE THEOREM FOR ABELIAN GROUPS 

The Structure Theorem for abelian groups asserts that a finitely generated abelian 
group V is a direct sum of cyclic groups. The work of the proof has already been 
done. We know that there exists a diagonal presentation matrix for V， and what re¬ 
mains for us to do is to interpret the meaning of this diagonal matrix for the group. 

We first need to extend the concept of direct sum from vector spaces to arbi- 
trary modules. The definition is the same. Let 呢， … ， W4 be submodules of a module 
V. Their sum is the submodule which they generate. It consists of all sums 

(6.1) Wi + ••• + Wk = {v S V \ v = + … + 收， with Wi E Wi}. 

The verification that this is a submodule is routine, and it is the same as for sums of 
subspaces of a vector space. We say that V is the direct sum of the submodules Wi if 
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( 6 . 2 ) 

(i) they generate : V" = Wi + … + Wfc; 

(ii) they are independent : If + "• + 二 0， with w/ E Wi, then wi = 0 for 
each i. 

Thus V is the direct sum of the submodules Wi if every element v E ： V can be writ¬ 
ten uniquely in the form u = vvi + … + h^ ；，with Wi E Wi. As with vector spaces ， 
two submodules W\ , W 2 are independent if and only if n W 2 = 0 [see Chapter 3 
(6.5)]. 

The symbol © is used to denote direct sums as before* So the notation 

(6.3) 

means that V is the direct sum of the submodules Wi ， 

(6.4) Theorem. Structure Theorem for abelian groups: Let V be a finitely gener¬ 
ated abelian group. Then V is a direct sum of finite cyclic subgroups Cd x ，…， Cd k and 
a free abelian group L: 

V = C4 ㊉…㊉㊉ L ， 

where the order d x of Cd x is greater than 1 ， and A | 1 禹 … _ 

We will use additive notation for the law of composition in the cyclic group here. So 
C n is generated by one element v, with one relation nv = 0. Thus C n is isomorphic 
to Z/(n). The isomorphism Z/(n) - >C n sends the residue of an integer r to rv. 

Proof of the theorem. We choose a presentation matrix A for V, determined by 
a set of generators and a complete set of relations. We can do this because V is 
finitely generated and because Z is a noetherian ring (see Section 5). By Proposition 
(5,12)，the matrix A may be replaced by QAP~\ where Q and 尸 are invertible. There¬ 
fore we may assume that A is diagonal，that the diagonal entries are nonzero, and 
that each diagonal entry divides the next. Moreover, we can drop any column of ze¬ 
ros, and any row and column in which the diagonal entry is 1 (5.12), So we may 

assume that the diagonal entries di are not 0 or 1. The matrix A will then have the 
shape 

d\ 

d 2 

畢 

(6.5) . • 

t 

* 

dk 

0 

■ 

It will therefore be an m x 众 matrix，where k 幺 m. The meaning of this in terms of 
generators and relations for our module is that V is generated by m elements 




Section 6 The Structure Theorem for Abelian Groups 


473 


Vi ， … ， v m ， and that 

(6.6) diV\ = 0, d 2 V2 = 0,… ，办 u* = 0 

forms a complete set of relations among these generators. 

For j = l ，..-， 众， let us denote by Cj the cyclic subgroup generated by u). Let L 
be the subgroup generated by the remaining generators u*+i ， … ， u m . Since the 
columns of (6.5) are a complete set of relations，there is no relation involving these 
last m — k generators. Therefore L is a free abelian group of rank m — k. We now 
verify that V = Ci ㊉…㊉ C 々 ㊉ L and that Cj is a cyclic group of order dj. First, 
since V is generated by the Vi and since each of the u, is included in one of the sum¬ 
mands, it is clear that V is the sum of these subgroups. Next, suppose that we have a 
relation, say 

i\ + + zjt + w = 0 ， 

where zj E C 7 and w E L. Since Q is the cyclic group generated by vj, we can 
write zj = rjvj for some integer rj. Similarly, we may write 
w = rjt+iUfc+i + … + r m v m for some integers rj. Then the relation has the form 

nui + … + r m v m = 0. 

Since the columns of (6,5) form a complete set of relations，the vector (n ， … ， rm) 1 is 
a linear combination of these columns. So rj = 0 if j > k ， which implies that 
w = 0. In addition, rj must be divisible by dj if j ^ k y say rj = djSj. Then 
ij = sjdjVj = 0. Thus the relation was trivial，and this shows that the subgroups are 
independent. It also shows that the order of the cyclic group Q is dj. So we have 
V = ㊉…㊉ ㊉ L，as required. □ 

A finite abelian group is finitely generated, so as stated above the Structure 
Theorem decomposes a finite abelian group into a direct sum of finite cyclic groups ， 
in which the order of each summand divides the next. The free abelian summand is 
zero in this case. It is sometimes convenient to decompose the cyclic groups further, 
into cyclic groups of prime power order. This decomposition is based on Proposition 
(8.4) of Chapter 2, which we restate here: 

(6.7) Let r, s be relatively prime integers. The cyclic group C mn of order rs is the 
direct sum of cyclic subgroups of orders r and s. □ 

Combining this lemma with the Structure Theorem yields the following: 

(6.8) Corollary. Structure Theorem，alternate form: Every finitely generated abe¬ 
lian group is a direct sum of cyclic groups of prime power orders and of a free 
abelian group. □ 

It is natural to ask whether the orders of the cyclic subgroups which decompose 
a given finite abelian group are uniquely determined by the group. If the order of V 



474 


Modules Chapter 12 


is a product of distinct primes, there is no problem. For example，if the order is 30, 
then V must be isomorphic to C 2 ® C 3 0 C 5 . But can the same group be both 
C 2 ㊉ C 2 ㊉ C 4 and (7 4 ㊉ C 4 ? It is not difficult to show that this is impossible by count¬ 
ing elements of orders 1 or 2. The group C 4 ®C 4 contains four such elements, while 
C 2 ® C 2 ® C 4 contains eight. This counting method will always work. 

(6.9) Theorem. Uniqueness for the Structure Theorem: 

(a) Suppose that a finite abelian group V is a direct sum of cyclic groups 

㊉…㊉ Cd k where dx\d 2 \... . The integers dj are determined by the group 
V. 

(b) The same is true if the decomposition is into prime power orders，that is，if 
each dj is the power of a prime. 

We leave the proof as an exercise, □ 

The counting of elements is simplified notationally by representing a direct 
sum as a product. Let be a ring* The direct product of /?-modules ，，.. ， is the 
product set 呢 x … x 取 of tuples: 

(6.10) 恥 X … xW k = G W t }. 

It is made into a module by vector addition and scalar multiplication; 

(Wi ， … ， wjO + (hV ， … ， vv〆)=(vw + wV ， … ， Wjt+vv〆) ， Wit) = (rw u ...,r\Vk). 

Verification of the axioms for a module is routine. 

Direct products and direct sums are isomorphic，as the following proposition 
shows: 

(6.11) Proposition* Let 州 ， • •. ， be submodules of an /?-module V. 

(a) The map Wi x … x 咐 - > V defined by 

a(w\,...,Wk) = wi + _•• + vvjt 

is a homomorphism of /?-modules, and its image is the sum Wi + … + VT*. 

(b) The homomorphism cr is an isomorphism if and only if V is the direct sum of 
the submodules Wi. 

We have seen similar arguments several times before, so we omit the proof. Note 
that the second part of the proposition is analogous to the statement that the map 

(2*5) R k - > V defined by a set (ui ， _“u*) is bijective if and only if this set is a 

basis. □ 

Since a cyclic group Cd of order d is isomorphic to the standard cyclic group 
Z/(d)，we can use Proposition ( 6 . 11 ) to restate the Structure Theorem as follows: 
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(6.12) Theorem. Product version of the Structure Theorem: Every finitely gener¬ 
ated abelian group V is isomorphic to a direct product of cyclic groups 

ZM) X … x Z/{d k ) X z r , 

where di , r are integers. There is a decomposition in which each d t divides the next 
and one in which each di is a prime power. □ 

This classification of abelian groups carries over to Euclidean domains without 
essential change. Since a Euclidean domain R is noetherian, any finitely generated 
/?-module V has a presentation matrix (5.6), and by the diagonalization theorem 
(4.6) there is a presentation matrix A which is diagonal. 

To carry along the analogy with abelian groups，we define a cyclic R-module V 
to be one which is generated by a single element v. This is equivalent with saying 
that V is isomorphic to a quotient module R/I, where / is the ideal of R elements a 

such that av = 0. Namely, the map <p\R - ^ V sending r rv is a surjective 

homomorphism of modules because v generates V, and the kernel of (p, the module 
of relations, is a submodule of /?, an ideal / (1.3). So V is isomorphic to R/I by the 
First Isomorphism Theorem. Conversely, if R/I - ^ V is an isomorphism, the im¬ 

age of 1 will generate V. If /? is a Euclidean domain, then the ideal / will be princi¬ 
pal, so V will be isomorphic to R/(a) for some a E R. In this case the module of 
relations will also be generated by a single element. 

Proceeding as in the case of abelian groups, one proves the following theorem: 

(6.13) Theorem. Structure Theorem for modules over Euclidean domains: 

(a) Let V be a finitely generated module over a Euclidean domain R. Then V" is a 
direct sum of cyclic modules Cj and a free module L. Equivalently, there is an 
isomorphism 

<p\ V - >R/(d\) x … x R/(dk) x R r 

of V with a direct product of cyclic modules R / (di) and a free module R r ， 
where r is nonnegative，the elements du … (h are not units and not zero, and di 
divides du\ for each i = 1 ， •. •，众 一 1 • 

(b) The same assertion as (a), except that the condition that di divides di+\ is re¬ 
placed by this: Each di is a power of a prime element of R. Thus V is isomor¬ 
phic to a product of the form 

R/ 、 P' e ' 、 HR/ 、 p n e n)XR r ， 

with repetitions of primes allowed. □ 

For example, consider the F[t]-module V presented by the matrix A of Exam¬ 
ple (4.7). According to (5.12)，it is also presented by the diagonal matrix 


A 
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and we can drop the first row and column from this matrix (5.12). So V is presented 
by the 1 X 1 matrix [g], where g (t) = (t - l) 2 (f — 2). This means that V is a cyclic 
module，isomorphic to Since g has two relatively prime factors, V can be 

further decomposed. It is isomorphic to the direct product of two cyclic modules 

(6.14) V - F[t]/(g) « [F[t]/(t - l) 2 ] X [F[t]/(t - 2)1 □ 

With slightly more work，Theorem (6*13) can be extended to modules over 
any principal ideal domain. It is also true that the prime powers occurring in (b) are 
unique up to unit factors, A substitute for the counting argument which proves Theo¬ 
rem (6,9) must be found to prove this fact. We will not carry out the proof. 


1. APPLICATION TOLESEAR OPERATORS 

In this section we apply the theory developed in the last section in a novel way to 
linear operators on vector spaces over a field. This application provides a good ex¬ 
ample of the way “proof analysis” can lead to new results in mathematics. The 
method developed first for abelian groups is extended formally to modules over Eu¬ 
clidean domains. Then it is applied to a concrete new situation in which the ring is a 
polynomial ring. This was not the historical development. The theories for abelian 
groups and for linear operators were developed independently and were tied together 
later. But it is striking that the two cases，abelian groups and linear operators, can be 
formally analogous and yet end up looking so different when the same theory is ap¬ 
plied to them* 

The key observation which allows us to proceed is that if we are given a linear 
operator 

(7.1) T: V — 

on a vector space over a field F f then we can use this operator to make V into a mod¬ 
ule over the polynomial ring F[t]. To do so, we have to define multiplication of a 
vector t; by a polynomial/(r) = a n t n + + a x t + a 0 . We set 

(7.2) f(t)v = a„T n (v) + ir n_ 1 (t;) + … + aiT(v) + a 0 v. 

The right side can be written as [f(T)](v), where f{T) denotes the linear operator 
a n T n + a n -\T n ~ l + + a{T + a^I obtained by substituting T for t. The brackets 

have been added only for clarity. With this notation, we obtain the formulas 

(7.3) tv = T (u) and f(t)v - [/⑺]⑹. 

The fact that rule (7.2) makes V into an F[r]-module is easy to verify. The formulas 
(7.3) may appear tautological. They raise the question of why we need a new symbol 
t. But remember that f(t) is a formal polynomial, while/(7) denotes a certain linear 
operator. 

Conversely, let V be an F[r]-module. Then scalar multiplication of elements of 
V by a polynomial/^) is defined• In particular, we are given a rule for multiplying 
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by the constant polynomials, the elements of F. If we keep the rule for multiplying 
by constants but forget for the moment about multiplication by nonconstant polyno¬ 
mials, then the axioms (LI) show that V becomes a vector space over F. Next, we 
can multiply elements of V by the polynomial t. Let us denote the operation of mul¬ 
tiplication by f on V as T. Thus T is the map 

(7.4) T: V - > V, defined by T(v) = tv. 

This map is a linear operator on V, when it is considered as a vector space over F. 
For t(v v f ) = tv tv f by the distributive law (1.1), and hence T(v + v f )= 
T(v) + T(v f ). And if c G F ， then tcv = ctv by the associative law (LI) and the 
commutative law in F[t]; hence T(cv) = cT(v) . So an F[r]-module V provides us 
with a linear operator on a vector space* 

The operations we have described, going from linear operators to modules and 
back，are inverses of each other: 

(7.5) Linear operator on an F-vector space and F[t]-module 

are equivalent concepts. 

We will want to apply this observation to finite-dimensional vector spaces, but 
let us note in passing the linear operator which corresponds to the free F[f]-module 
F [f] of rank 1. We know that F[t] is infinite-dimensional when it is considered as a 
vector space over F. The monomials (l ， f ， f 2 ，.") form a basis，and we can use this 
basis to identify F[t] with the space Z of infinite F-vectors, as in Chapter 10 (2.8): 

Z = {(ao,a\,a 2 ,...)\ai E F and only finitely many at are nonzero}. 

Multiplication by / on F [t] corresponds to the shift operator T: 

(a 0 ， ai ， a 2 , .^) AM/ ^(0,a 0 ,ai,a 2 , …). 

Thus, up to isomorphism, the free F[f]-module of rank 1 corresponds to the shift 
operator on the space Z, 

We now begin our application to linear operators - Given a linear operator T on 
a vector space V over F, we may also view V as an F[r]-module* Let us suppose that 

V is finite，dimensional as a vector space, say of dimension n. Then it is certainly 
finitely generated as a module, and hence it has a presentation matrix. There is some 
danger of confusion here because there are two matrices around: the presentation 
matrix for the module V, and the matrix of the linear operator 7. The presentation 
matrix is an r X 5 matrix with polynomial entries, where r is the number of chosen 
generators for the module and s is the number of relations. On the other hand, the 
matrix of the linear operator is an « x « matrix whose entries are scalars, where n is 
the dimension of V as a vector space. Both matrices contain the information needed 
to describe the module and the linear operator. 

Regarding V as an F[r]-module, we can apply Theorem (6.13) to conclude that 

V is a direct sum of cyclic submodules, say 

V = 叭 ㊉…㊉ Wt ， 
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where Wi is isomorphic to F[t]/(pi ei )^ pi(t) being an irreducible polynomial in F[t]. 
There is no free summand, because we are assuming that V is finite-dimensional. 

We have two tasks: to interpret the meaning of the direct sum decomposition 
for the linear operator T, and to describe the linear operator when the module is 
cyclic. It will not be surprising that the direct sum decomposition gives us a block 
decomposition of the matrix of T y when a suitable basis is chosen. The reason is that 
each of the subspaces Wi is 7-invariant, because Wi is an ^[^-submodule. Multipli¬ 
cation by t carries Wi to itself，and t operates on V as the linear operator T. We 
choose bases for the subspaces Wi. Then the matrix of T with respect to the basis 
B = has the desired block form [Chapter 4 (3.8)], 

Next, let 灰 be a cyclic F[r]-module. Then W is generated as a module by a 
single element w; in other words, every element of W can be written in the form 

g{t)w = b r t r w + ••• + b x tw + bow, 

where g(t) = b r t r + … + bit + bo E F[t]. This implies that the elements 
w ， nv ， f 2 iv， … span ^ as a vector space. In terms of the linear operator, W is 
spanned by the vectors w ， r(vv) ， r 2 (w)， …. 

Various relations between properties of an F[f]-module and the corresponding 

linear operator are summed up in the table below. 

(7.6) Dictionary. 

multiplication by t operation of T 

free module of rank 1 shift operator 

cyclic module generated by v vector space spanned by v ， T(v) y T 2 (v) f ... 

submodule 7Mnvariant subspace 

direct sum of submodules direct sum of ^-invariant subspaces 

F[t]-module Linear operator T 

Let us now compute the matrix of a linear operator T on a vector space which 
corresponds to a cyclic F[r]-module. Since every ideal of F[t] is principal, such a 
module will be isomorphic to a module of the form 

(7.7) W = FW/(/), 

where / = t n + a n -\t n ^ x + ••• + + a 0 is a polynomial in F[t]. Let us use the 

symbol wo to denote the residue of 1 in W. This is our chosen generator for the mod¬ 
ule, Then the relation/w 0 = 0 holds, and/generates the module of relations. 

The elements m> ， nv 0 , …，？ 71-1 wo form a basis for F[t]/(f) [see Chapter 10 

(5.7) ], Let us denote this basis by w/ = t l w 0 . Then 

tWo — W\j tW\ — W2, … ， tWn-2 = VVn-i, 

and also/wo = 0. This last relation can be rewritten using the others in order to de¬ 
termine the action of t on w n -\： 

{t n + a n -it n ~ l + -- + ayt + a 0 )wo = tw n -\ + a n -iW n -\ + *** + a\W\ + OqWo = 0. 
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Since T acts as multiplication by t, we have 


and 



T(w n -2) = Wn-U 


T(w n -l) — — 知 -iW/i-i —fliWi — floWo. 


This determines the matrix of T. It has the form illustrated below for various values 



(7.9) Theorem, Let r be a linear operator on a finite-dimensional vector space V 
over a field F. There is a basis for V with respect to which the matrix of T is made up 
of blocks of the type (7.8), □ 


Such a form for the matrix of a linear operator is called a rational canonical 
form. It isn’t particularly nice，but it is the best form available for an arbitrary field. 

For example，the module (6.14) is a direct sum of two modules. Its rational 
canonical form is 


( 7 . 10 ) 



We now consider more carefully the case that F is the field of complex num¬ 
bers. Every irreducible polynomial in C[t] is linear ， p(t) = t — a, so according to 
Theorem (6.12)，every finite - dimensional C[f]-module is a direct sum of submod¬ 
ules isomorphic to ones of the form 

(7.11) W = - a) n . 

We let vv 0 denote the residue of 1 in as before, but we make a different choice of 
basis for W this time，setting w; = (t—a) l wo. Then 

(t-a)wo = w u (t-a)w x = (t-a)w n -i = Wti，and (t—a)w n ^i = 0. 

We replace r by 7 and solve，obtaining 

Twi = w/+i + awi, 

for i = 0,…， n — 2, and 

Tw n -i = aw n -u 
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The matrix of T has the form 

nl a 0 0 

(7.12) [a], ^ , 1 a 0 

Ll aJ 0 1 a 

These matrices are called Jordan blocks• Thus we obtain the following theorem: 

(7*13) Theorem. Let T: V - > V be a linear operator on a finite-dimensional 

complex vector space. There is a basis of V such that the matrix of T with respect to 
this basis is made up of Jordan blocks. □ 



Such a matrix is said to be in Jordan form, or to be a Jordan matrix. Note that it is 
lower triangular, so the diagonal entries are its eigenvalues. Jordan form is much 
nicer than rational canonical form. 

It is not hard to show that every Jordan block has a unique eigenvector. 

Given any square complex matrix A, the theorem asserts that PAP~ l is in Jor¬ 
dan form for some invertible matrix P. We often refer to PAP— 1 as “the Jordan form 
for AT It is unique up to permutation of the blocks, because the terms in the direct 
sum decomposition are unique, though we have not proved this. 

The Jordan form of the module (6.14) is made up of two Jordan blocks; 


(7.14) 

One important application of Jordan form is to the explicit solution of systems 
of a first-order linear differential equation 



(7.15) 



As we saw in Chapter 4 (7.11)，the problem of solving this equation reduces easily 


dX 


to solving the equation 


AX, where A = PAP~ l is any similar matrix. So pro¬ 


vided that we can determine the Jordan form A of the given matrix A , it is enough to 
solve the resulting system. This in turn reduces to the case of a single Jordan block. 
One example of a 2 x 2 Jordan block was computed in Chapter 4 (8.18). 

The solutions for an arbitrary kxk Jordan block A can be determined by com¬ 
puting the matrix exponential. We denote by N the k x k matrix obtained by substi¬ 
tuting a = 0 into (7.12)* Then N k = 0. Hence 


e Nt = I + 价 /1! + … + N k ~ l t k " l /(k - 1)!. 


This is a lower triangular matrix which is constant on diagonal bands and whose 
entries on the zth diagonal band below the diagonal are t l /iL Since iV and al 
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commute ， 


e At = e at e Nt = e a ’(/ 十奶 /1! + … + N k ~ l t k ~ l /(k — 1)!). 
Thus if A is the matrix 


then 



Theorem (8* 14) of Chapter 4 tells us that the columns of this matrix form a basis for 
the space of solutions of the differential equation (7.15)* 

Computing the Jordan form of a given matrix requires finding the roots of its 
characteristic polynomial p(t). If the roots ai ， ... ， a« are distinct，the Jordan form is 
diagonal: 


«i 

•- 

'OLk 


Suppose that the root a i = a is an r-fold root of p(t). Then there are various possi¬ 
bilities for the part of the Jordan matrix with diagonal entries a. Here are the possi¬ 
bilities for small r: 





■ ■ 

a 


a 



a 



1 a 


1 a 



1 a 


4: 

1 a 


1 a 




a 


1 a 

■ — 


— 

a 

- 



1 a 


a 

1 a 

a 

a 


a 





a 





a 





a 
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They can be distinguished by computing eigenvectors of certain operators related to 
T. The space of solutions to the system of equations 

(A — al)x = 0 

is the space of eigenvectors of A with eigenvalue a. One can solve this system ex¬ 
plicitly, given A and a. If r = 4， the dimensions of the solution space in the five 
cases shown above are 1,2,2,3, 4 respectively, because one eigenvector is associ¬ 
ated to each block. So this dimension distinguishes all cases except the second and 
third. These remaining two cases can be distinguished by the matrix (A — a/) 2 . It is 
zero in case three and not zero in case two. 

It can be shown that the dimensions of the null spaces of the operators 
(A — al) v , v — 1， 2，...， 尸/2, distinguish the Jordan forms in all cases. 


& FREE MODULES O VER POL YNOMIAL RINGS 


The structures of modules over a ring become increasingly complicated with increas¬ 
ing complication of the ring* It is even difficult to determine whether or not an ex¬ 
plicitly presented module is free. In this section we describe, without proof, a theo¬ 
rem which characterizes free modules over polynomial rings. This theorem was 
proved by Quillen and Suslin in 1976. 

Let R — C[xi^.. f xk] be the polynomial ring in k variables，and let V be a 
finitely generated /?-module. We choose a presentation matrix A for the module. The 
entries of A will be polynomials aij(x), and if A is an m X n matrix, then V is isomor¬ 
phic to the cokernel R m /AR n of multiplication by A on /?-vectors. We can evaluate 
the matrix entries aij(x) at any point p = pk) of C' obtaining a complex ma¬ 

trix A(p) whose /j*-entry is aij(p ). 

(8.1) Theorem. Let V be a finitely generated module over the polynomial ring 
C[Xi， …， xjt]，and let A be an m x n presentation matrix for V. Denote by A(p) the 
evaluation of A at a point p E C 夂 Then V is a free module of rank r if and only if 
A(p) has rank m — r for every point p. □ 

The proof of this theorem requires background which we don’t have. How¬ 
ever, we can easily see how to use it to determine whether or not a given module is 
free. For example, consider the polynomial ring in two variables: R = C[x,y]. Let 
V be the module presented by the 4 X 2 matrix 


(8,2) A = x 

x : 

So V has four generators and two relations. Let p be a point (a, b) e C 2 . The two 


x 

jc+3 




2 
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columns of the matrix A p are 

V\ = ( 1 , b, a,a 2 )\ V 2 = (a, «+ 3 ? b, b 2 ) 1 . 

It is not hard to show that these two vectors are linearly independent for every 
choice of a, b, from which it follows that the rank of A(p) is 2 for every point (a,b). 
For suppose that the vectors are dependent: v 2 = cv u or vice versa. Then the first 
coordinates show that V 2 = av \, hence 

(8.3) a + 3 = ab ， b = a 2 , b 2 = 

These equations have no common solutions. By Theorem (8.1), V is a free module 
of rank 2. 

We can get an intuitive understanding for this theorem by considering the vec¬ 
tor space V p ― C m /A(p) C" which is presented by the complex matrix A(p). It is nat¬ 
ural to think of this vector space as a kind of “evaluation of the module V at the 
point p，” and it can be shown that V p is essentially independent of the choice of the 
presentation matrix. Therefore we can use the module V to associate a vector space 
V p to every point p E C*. If we imagine moving the point p about, then the vector 
space V p will vary in a continuous way, providing that its dimension does not jump 
around. This is because the matrix A(p) presenting V p depends continuously on p. 
Families of vector spaces of constant dimension，parametrized by a topological 
space，are called vector bundles. The module is free if and only if the family of vec¬ 
tor spaces V p forms a vector bundle. 


“Par une deformation coutumiere aux mathematiciens f 
je me en tenais au point de vue trop restreint • 

Jean-Louis Verdier 


EXERCISES 

L The Definition of a Module 

1* Let /? be a ring，considered as an /^-module. Determine all module homomorphisms 
<p: R —— >R. 

2. Let Wb& a submodule of an i?-module V. Prove that the additive inverse of an element of 
W is in W. 

3. Let <p: V - > W be a homomorphism of modules over a ring R, and let V \W f be sub- 

modules of V,W respectively. Prove that (p(V f ) is a submodule of W and that (p~ l (W f ) is 
a submodule of V. 

4. (a) Let V be an abelian group. Prove that if V has a structure of Q-module with its given 

law of composition as addition, then this structure is uniquely determined. 

(b) Prove that no finite abelian group has a Q-module structure. 

5. Let R = Z[a], where a is an algebraic integer. Prove that for any integer m, R/mR is 
finite，and determine its order. 
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6. A module is called simple if it is not the zero module and if it has no proper submodule. 

(a) Prove that any simple module is isomorphic to R/M, where M is a maximal ideal, 

(b) Prove Schur's Lemma: Let <p\ S - be a homomorphism of simple modules. 

Then either <p is zero, or else it is an isomorphism. 

7. The annihilator of an /^-module V is the set I - {r G R \ rV = 0}. 

(a) Prove that I is an ideal of /?. 

(b) What is the annihilator of the Z-module Z/(2) x Z/(3) x Z/(4)? of the Z-module 
Z? 

8. Let be a ring and V an /?-module. Let E be the set of endomorphisms of V y meaning 

the set of homomorphisms from V to itself. Prove that £ is a noncommutative ring, with 
composition of functions as multiplication and with addition defined by 
[cp + = (p(m) + 

9. Prove that the ring of endomorphisms of a simple module is a field. 

10 . Determine the ring of endomorphisms of the /?-module (a) R and (b) R/I, where I is an 
ideal. 

11 . Let W C V C U be /?-modules. 

(a) Describe natural homomorphisms which relate the three quotient modules U/W, 
U/V, and V/W. 

(b) Prove the Third Isomorphism Theorem: U/V — {U/W)/(V/W). 

12 . Let V, W be submodules of a module U. 

(a) Prove that 1/ Pi IV and V + are submodules. 

(b) Prove the Second Isomorphism Theorem: (V + W)/W is isomorphic to Pi W). 

13 * Let V be an 尺 -module，defined as in (1.1). If the ring R is not commutative，it is not a 
good idea to define vr — rv. Explain. 

2. Matrices^ Free Modules^ and Bases 

1. Let/? = C[x,y], and let M be the ideal of R generated by the two elements (x.y). Prove 
or disprove: M is a. free /?-module. 

2 . Let A be an n x n matrix with coefficients in a ring R, let <p: R n - >R n be left multipli¬ 

cation by A, and let d = det A. Prove or disprove: The image of cp is equal to dR n , 

3 . Let / be an ideal of a ring R. Prove or disprove: If R/I is a free /?-module, then 1 = 0. 

4. Let /? be a ring，and let V be a free /^-module of finite rank. Prove or disprove; 

(a) Every set of generators contains a basis. 

(b) Every linearly independent set can be extended to a basis. 

5. Let / be an ideal of a ring /?. Prove that / is a free /?-module if and only if it is a principal 
ideal，generated by an element a which is not a zero divisor in R. 

6. Prove that a ring/? such that every finitely generated /^-module is free is either a field or 
the zero ring. 

Let A be the matrix of a homomorphism <p\ Z rt - > Z m between free modules. 

(a) Prove that <p is injective if and only if the rank of A is n. 

(b) Prove that <p is surjective if and only if the greatest common divisor of the determi¬ 
nants of the mx m minors of A is 1 . 

8. Reconcile the definition of free abelian group given in Section 2 with that given in Chap¬ 
ter 6 ? Section 8. 
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3. The Principle of Permanence of Identities 

1. In each case, decide whether or not the principle of permanence of identities allows the 
result to be carried over from the complex numbers to an arbitrary commutative ring. 

(a) the associative law for matrix multiplication 

(b) Cayley-Hamilton Theorem 

(c) Cramer’s Rule 

(d) product rule，quotient rule，and chain rule for differentiation of polynomials 

(e) the fact that a polynomial of degree n has at most n roots 

(f) Taylor’s expansion of a polynomial 

2. Does the principle of permanence of identities show that det AS = det A det B when the 
entries of the matrices are in a noncommutative ring R1 

3, In some cases，it may be convenient to verify an identity only for the real numbers. Does 
this suffice? 

4, Let /? be a ring, and let A be a 3 x 3 /?-matrix in S0 3 (R) ， that is，such that A { A — / and 
det A = 1. Does the principle of permanence of identities show that A has an eigenvector 
in /? 3 4 5 6 7 with eigenvalue 1? 

4. Diagonalization of Integer Matrices 


1. Reduce each matrix below to diagonal form by integer row and column operations ， 


⑻ 


-1 2 


(b) 


1 2 3 

4 5 6 


(c) 


3 

2 

-4 


-3 


-4 


~2 


(d) In the first case, let V = Z 2 and let L 
mensurable bases of V and L. 


AV. Draw the sublattice L, and find com 


2. Let 4 be a matrix whose entries are in the polynomial ring F\t], and let A f be obtained 
from A by polynomial row and column operations. Relate det A to det A\ 

3 . Determine integer matrices P~\Q which diagonalize the matrix A = 

4 . Let ch ， d 2 , … be the integers referred to in Theorem (4,3). 

(a) Prove that d\ is the greatest common divisor of the entries atj of A . 

(b) Prove that d\d 2 is the greatest common divisor of the determinants of the 2 x 2 
minors of A . 

(c) State and prove an extension of (a) and (b) to di for arbitrary /. 



5. Determine all integer solutions to the system of equations AX 二 0 ， when 

,_ [4 7 2] 

A 一 2 4 6 * 

6 . Find a basis for the following submodules of Z 3 . 

(a) The module generated by (1,0, -1) ，（ 2, _3, 1) ， （ 0,3, 1), (3,1,5). 

(b) The module of solutions of the system of equations x + 2y + 3z = 0, 

x + + 9z = 0. 

rill r -i] 

7. Prove that the two matrices 1 and ^ generate the group 5L 2 (Z) of integer 

一 —j i_ 一 

matrices with determinant 1. 



486 


Modules Chapter 12 


8 攀 Prove that the group SL n (Z) is generated by elementary integer matrices of the first type. 

9* Let a ， /3，y be complex numbers, and let A = {£a + /n/3 + ny \ E Z} be the 

subgroup of C+ they generate. Under what conditions is A a lattice in C? 

10, Let <p: Z k - be a homomorphism given by multiplication by an integer matrix A. 

Show that the image of <p is of finite index if and only if A is nonsingular and that if so, 
then the index is equal to | det A | • 

11. (a) Let A = (ai,^.,a n y be an integer column vector. Use row reduction to prove that 

there is a matrix P E GL n (Z) such that PA = (d, 0 ， ... ， 0) f ，where d is the greatest 
common divisor of ai ， … ， a n , 

(b) Prove that if J = 1, then A is the first column of a matrix of Af E SL n {Z). 

5 . Generators and Relations for Modules 


1. In each case, identify the abelian group which has the given presentation matrix: 



2. Find a ring R and an ideal I of R which is not finitely generated. 

3. Prove that existence of factorizations holds in a noetherian integral domain. 


4. Let V C C n be the locus of zeros of an infinite set of polynomials /i ， /] ， /3, _ … Prove 
that there is a finite subset of these polynomials whose zeros define the same locus. 

5* Let 5 be a subset of C n . Prove that there is a finite set of polynomials (/!，•，.，/*) such 
that any polynomial which vanishes identically on 5 is a linear combination of this set, 
with polynomial coefficients. 

6. Determine a presentation matrix for the ideal (2,1 + 8) of / [S]，where 8 = V-5. 

*7* Let 5 be a subring of the ring R = C[t] which contains C and is not equal to C. Prove 
that R is sl finitely generated 5-module. 

8. Let A be the presentation matrix of a module V with respect to a set of generators 

u m ). Let (h^ ， _•• ， hv) be another set of elements of V, and write the elements in 


terms of the generators，say wi = pij E R. Let P = (pij). Prove that the block 


matrix 


A 

-p 

0 

I 


is a presentation matrix for V with respect to the set of generators 


( q ，…，，…， hv ). 

*9* With the notation of the previous problem, suppose that (Wi ， … ， w r ) is also a set of gen¬ 
erators of V and that is a presentation matrix for V with respect to this set of generators. 
Say that Vi = is an expression of the generators Vi in terms of the wj. 


(a) Prove that the block matrix M = 


A 

一 p 

I 

0 

0 

I 

~Q 

B 


presents V with respect to the 


generators ( 仍， … ， D m ; Wi ， … ， mv). 

(b) Show that M can be reduced to A and to by a sequence of operations of the form 
(5.12). 

10. Using 9, show that any presentation matrix of a module can be transformed to any other 
by a sequence of operations (5.12) and their inverses. 
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6. The Structure Theorem for Abelian Groups 

1* Find a direct sum of cyclic groups which is isomorphic to the abelian group presented by 

~2 2 2~ 

the matrix 2 2 0. 

2 0 2_ 

2. Write the group generated by x,y, with the relation 3x + 4^ = 0 as a direct sum of 
cyclic groups. 

3. Find an isomorphic direct product of cyclic groups, when V is the abelian group gener¬ 
ated by m with the given relations. 

(a) 3x + 2 }； + 8z = 0, 2^r + 4z = 0 

(b) x + y = 0, 2x = 0, 4x + 2z = 0, 4x + 2y + 2z — 0 

(c) 2x + y = 0^x~^y + 3z — 0 

(d) 2x — 4y = 0 ? 2x + 2y + z — 0 

(e) lx + 5y + 2z ~ 0 ? 3;c + 3y = 0 ， 13;c + lly + 2z = 0 

4. Determine the number of isomorphism classes of abelian groups of order 400. 

5. Classify finitely generated modules over each ring. 

⑻ Z/ ⑷ （ b) Z/ ⑹ （ c) Z/nZ. 

6* Let /? be a ring, and let V be an module, presented by a diagonal mx n matrix A: 
V — R m fAR n , Let (! ；】 ”•• ， v m ) be the corresponding generators of V, and let di be the di- 
agonal entries of A, Prove that V is isomorphic to a direct product of the modules R/(di), 

7. Let V be the / [/]-module generated by elements i ； i, Vz with relations (1 + O^i + 
(2 — 0^2 = 0 ? 3i；i + 5iv2 = 0* Write this module as a direct sum of cyclic modules. 

8. Let be submodules of an module V such that V = EW/. Assume that 

W l nw 2 ^ 0 y (Wi + W 2 ) nw 3 = 0 , …， （％ + + ••• + W k -i) r\W k = 0, Prove 

that V is the direct sum of the modules Wi ，…， . 

9. Prove the following. 

(a) The number of elements of Z/(p e ) whose order divides p v is p v if p < and is p e 
if v > e. 

(b) Let W\ 9 .., 9 Wk be finite abelian groups, and let uj denote the number of elements of 
Wj whose order divides a given integer q. Then the number of elements of the 
product group V = W\ x x Wk whose order divides q is u\ 

(c) With the above notation, assume that Wj is a cyclic group of prime power order 

dj = p e L Let r! be the number of dj equal to a given prime p, let r 2 be the number of 
dj equal to p 2 , and so on. Then the number of elements of V whose order divides p p 
is p~, where 心 =ri + …+ rk, S 2 — r\ + + “• + 2r* ， 幻 =n + 2r 2 + 

3,3 + … + 3/>，and so on, 

(d) Theorem (6.9). 


Z Application to Linear Operators 


1. Let r be a linear operator whose matrix is 
cyclic? 


2 

0 


.Is the corresponding C[，]-module 
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1 1 0 

2. Determine the Jordan form of the matrix 0 10. 

0 1 1 

1 1 1 

3. Prove that -1-1-1 is an idempotent matrix, and find its Jordan form. 

L 1 1 U 

4. Let V be a complex vector space of dimension 5, and let 7 be a linear operator on V 
which has characteristic polynomial (r - a) 5 . Suppose that the rank of the operator 
T — al is 2. What are the possible Jordan forms for T? 

5. Find all possible Jordan forms for a matrix whose characteristic polynomial is 
(t + 2f{t - 5)\ 

6* What is the Jordan form of a matrix whose characteristic polynomial is (t - 2) 2 (t - 5) 3 
and such that the space of eigenvectors with eigenvalue 2 is one-dimensional ， while the 
space of eigenvectors with eigenvalue 5 is two-dimensional? 

7. (a) Prove that a Jordan block has a one-dimensional space of eigenvectors. 

(b) Prove that ， conversely，if the eigenvectors of a complex matrix A are multiples of a 
single vector, then the Jordan form for A consists of one block. 

8. Determine all invariant subspaces of a linear operator whose Jordan form consists of one 
block. 


9* In each case, solve the differential equation dx/dt = AX when A is the Jordan block 


given. 





—■ — 


1 1 

⑻ 

'2 " 

1 2 

⑻ 

0 0 

1 0 

— — 1 

(c) 

1 1 

1 1 

■ ■ 


10* Solve the differential equation dx/dt = AX when A is (a) the matrix (7.14 )， （ b) the ma¬ 
trix (7.10 )， （ c) the matrix of problem 2, (d) the matrix of problem 3. 

11* Prove or disprove: Two complex nXn matrices A 9 B ait similar if and only if they have 
the same Jordan form. 

12. Show that every complex nX n matrix is similar to a matrix of the form D + N ， where D 
is diagonal, N is nilpotent，and DN — ND. 

13. Let /? - F[x] be the polynomial ring in one variable over a field F, and let V be the R- 
module generated by an element t; which satisfies the relation (x 3 + 3x + 2)v = 0. 
Choose a basis for V as F-vector space，and find the matrix of the operator multiplication 
by t with respect to this basis. 

14* Let V be an F[f]-module, and let B = (i；i ? i;^) be a basis for V, as F-vector space. Let 
B be the matrix of T with respect to this basis. Prove that ^ W — B is a presentation 
matrix for the module. 

15. Let p(t) be a polynomial over a field F. Prove that there exists an nx n matrix with en¬ 
tries in F whose characteristic polynomial is p(t). 

16. Prove or disprove: A complex matrix A such that A 2 = A is diagonalizable. 

17. Let A be a complex nx n matrix such that A k = I for some n. Prove that the Jordan form 
for A is diagonal. 

18. Prove the Cayley—Hamilton Theorem, that if p (t) is the characteristic polynomial of an 
nx n matrix A , then p(A) = 0. 
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19* The minimal polynomial m(t) of a linear operator 7 on a complex vector space V is the 
polynomial of lowest degree such that m(T) — 0. 

(a) Prove that the minimal polynomial divides the characteristic polynomial. 

(b) Prove that every root of the characteristic polynomial p(t) is also a root of the mini¬ 
mal polynomial m(t). 

(c) Prove that T is diagonalizable if and only if m{t) has no multiple root. 

20. Find all possible Jordan forms for 8x8 matrices whose minimal polynomial is 

x 2 (x — l) 3 , 

21. Prove or disprove: A complex matrix A is similar to its transpose. 

22. Classify linear operators on a finitely generated F[f]-module, dropping the assumption 
that the module is finite-dimensional as a vector space. 

23. Prove that the ranks of (A - al) v distinguish all Jordan forms, and hence that the Jordan 
form depends only on the operator and not on the basis, 

24. Show that the following concepts are equivalent: 

(i) /?-module, where R ― Z[i]; 

(ii) abelian group V, together with a homomorphism <p: V - > V such that 

(p o (p = -identity. 

25. Let F — ¥ p . For which prime integers p does the additive group F 1 have a structure of 
Z[/]-module? How about F 2 1 

26. Classify finitely generated modules over the ring C[e]，where e 2 = 0* 


8. Free Modules over Polynomial Rings 



2 . 

3. 

4. 


Determine whether or not the modules over C[x ? ^] presented by the following matrices 
are free. 







■JC-l 

X 

⑻ 

x 2 +l X 

x 2 y+x+y xy+l 

(b) 

xy— 1 
x 2 -/ 

(c) 

y 

X 

y+l 

y 




_ y _ 


X 2 

2y] 


Prove that the module presented by (8.2) is free by exhibiting a basis. 

Following the model of the polynomial ring in one variable, describe modules over the 
ring C[x,^] in terms of real vector spaces with additional structure. 

Let /? be a ring and V an module. Let / be an ideal of R, and let IV be the set of finite 
sums YsiVi ，where Si E I and Vi E V, 

(a) Show how to make V/IV into an /?//-module. _ 

(b) Let Abe a presentation matrix for V， and let A denote its residue in R/L Prove that A 
is a presentation matrix for V/IV. 

(c) Show why the module V p defined in the text is essentially independent of the presen¬ 
tation matrix. 


*5. Using exercise 9 of Section 5, prove the easy half of the theorem of Quillen and Suslin: 
If V is free，then the rank of A(p) is constant. 

6. Let/? = Z[V-5], and let V be the module presented by the matrix A = 

(a) Prove that the residue of A has rank 1 for every prime ideal P of R, 

(b) Prove that V is not free. 


2 

1+5 
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L Let G be a lattice group，and let g be a rotation in G. Let 歹 be the associated element of 
the point group G. Prove that there is a basis for U 2 , not necessarily an orthonormal ba¬ 
sis, such that the matrix of g with respect to this basis is in GL 2 (Z)• 

*2. (a) Let a be a complex number，and let Z[a] be the subring of C generated by a . Prove 
that a is an algebraic integer if and only if Z[a] is a finitely generated abelian group. 

(b) Prove that if a,jS are algebraic integers，then the subring Z[a, jS] of C which they 
generate is a finitely generated abelian group. 

(c) Prove that the algebraic integers form a subring of C. 

*3* Pick's Theorem: Let A be the plane region bounded by a polygon whose vertices are at 
integer lattice points. Let I be the set of lattice points in the interior of A and B the set 
of lattice points on the boundary of A. If p is a lattice point, let r(p) denote the fraction 
of 277 of the angle subtended by A at /?. So r(p) = 0 if p E A, r(p) = 1 if p is an inte¬ 
rior point of A ， r(p) = 士 if 尸 is on an edge，and so on. 

(a) Prove that the area of A is r (/ 7 )* 

(b) Prove that the area is | /1 + ^(\B\ - 2) if A has a single connected boundary curve, 
4, Prove that the integer orthogonal group O n {Z) is a finite group. 

*5. Consider the space V = of column vectors as an inner product space，with the ordi¬ 
nary dot product (v ^ w) = v x w. Let L be a lattice in V, and define L* — 
{w I (t? * >v) E Z for all t; E L}. 

(a) Show that L* is a lattice. 

(b) Let B = (n ， … ，叫 ） be a lattice basis for L，and let P = [B]— 1 be the matrix relating 
this basis of V to the standard basis E. What is the matrix A of dot product with re¬ 
spect to the basis B? 

(c) Show chat the columns ofP form a lattice basis for 

(d) Show that if A is an integer matrix, then L C L*, and [L* : L] = | det A . 

6. Let V be a real vector space having a countably infinite basis ，巧，仍 ，， _ .}，and let E be 
the ring of linear operators on V. 

(a) Which infinite matrices represent linear operators on V? 

(b) Describe how to compute the matrix of the composition of two linear operators in 
terms of the matrix of each of them. 

(c) Consider the linear operators T, T f defined by the rules 

T(v 2 n) = Vn, T(v 2n -\) = 0, T f (v 2n ) = 0, T r (v 2 n-\) 二 v „， H = 1 ， 2,3 ，， … 
Write down their matrices. 

(d) We can consider E 1 = £ as a module over the ring E, with scalar multiplication on 
the left side of a vector. Show that {T,T f } is a basis of £ 1 as E-module. 

(e) Prove that the free modules E k ，k = 1 ， 2, 3".，are all isomorphic. 

7. Prove that the group 0 + // + is not an infinite direct sum of cyclic groups, 

8. Prove that the additive group Q + of rational numbers is not a direct sum of two proper 
subgroups. 

9* Prove that the multiplicative group Q x of rational numbers is isomorphic to the direct 
sum of a cyclic group of order 2 and a free abelian group with countably many genera¬ 
tors. 
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10 * Prove that two diagonalizable matrices are simultaneously diagonalizable，that is, that 
there is an invertible matrix P such that PAP~ X and PBP~ l are both diagonal，if and only 
if AB = BA. 

*11. Let A be a finite abelian group，and let <p: A - > C x be a homomorphism which is not 

the trivial homomorphism (<p(x) = 1 for all x). Prove that 2 屮 （ a) = (X 

Cl 

12, Let A be an m x n matrix with coefficients in a ring R, and let <p: R n - >R m be left mul' 

tiplication by A. Prove that the following are equivalent: 

(i) ip is surjective; 

(ii) the determinants of the m x m minors of A generate the unit ideal; 

(iii) A has a right inverse，a matrix B with coefficients in R such that AB — I. 

*13. Let … i ， •" ， v m ) be generators for an /?-module V, and let J be an ideal of R. Define JV to 
be the set of all finite sums of products av y a E. J y v G V. 

(a) Show that if JV = V, there is an nx n matrix A with entries in J such that 
(A ，…， t; w )(/ - A) = 0. 

(b) With the notation of (a), show that det (/ - A) = \ + a, where a E and that 
det (/ - A) annihilates V, 

(c) An /?-module V is called faithful if rV = 0 for r G /? implies r = 0. Prove the 
Nakayama Lemma: Let V be a finitely generated, faithful /?-module, and let J be an 
ideal of R. If JV = then J = /?. 

(d) Let V be a finitely generated /?-module. Prove that if MV = V for all maximal ideals 
M 9 then V = 0, 

*14. We can use a pair of complex polynomials in t to define a complex path in C 2 , 

by sending 0))，They also define a homomorphism <p: C[x,y] - >C [?] 5 

by f(x,y) (r) ? 3 ； (/)). This exercise analyzes the relationship between the path and 

the homomorphism. Let’s rule out the trivial case that are both constant. 

(a) Let S denote the image of <p t Prove that S is isomorphic to the quotient C[jc ，>?]/(/)， 
where f(x,y) is an irreducible polynomial. 

(b) Prove that t is the root of a monic polynomial with coefficients in S, 

(c) Let V denote the variety of zeros of/in C 2 . Prove that for every point ( 义 0 ，外 ） E V ， 
there is a E C such that (Xo ， y 0 ) = (x (t 0 ) 9 y (t 0 )) ^ 
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Our difficulty is not in the proofs ， but in learning what to prove . 

Emil Artin 


h EXAMPLES OF FIELDS 

Much of the theory of fields has to do with a pair F G K of fields, one contained in 
the other. In contrast with group theory, where subgroups play an important role ， 
we usually consider K as an extension of F\ that is, F is considered to be the basic 
field，and K is related to it. An extension field of F is a field which contains F as a 
subfield. 

Here are the three most important classes of fields. 

(1.1) Number fields. A number field K is a subfield of C- 

Any subfield of C contains 1， and hence it contains the field Q of rational numbers. 
So a number field is an extension of O* The number fields most commonly studied 
are algebraic number fields ， all of whose elements are algebraic numbers (see Chap¬ 
ter 10, Section 1). We studied quadratic number fields in Chapter 11. 

(1.2) Finite fields. A field having finitely many elements is called a finite field. 

If ^ is a finite field, then the kernel of the unique homomorphism <p: Z - > K is sl 

prime ideal [Chapter 11 (7.15)], and since Z is infinite while K is finite, the kernel is 
not zero. Therefore it is generated by a prime integer p. The image of <p is isomor¬ 
phic to the quotient Z/(p) = ¥ p . So K contains a subfield isomorphic to the prime 
field F p ，and therefore it can be viewed as an extension of this prime field. We will 
describe all finite fields in Section 6. 
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(1.3) Function fields. Certain extensions of the field F = C(jc) of rational func¬ 
tions are called function fields. 

Function fields play an important role in the theory of analytic funtions and in alge¬ 
braic geometry. Since we haven’t seen them before，we will describe them briefly 
here. A function field can be defined by an irreducible polynomial in two variables ， 
say f{x,y) G C[x,y] t The polynomial f(x, y) = y 2 — jc 3 + jc is a good example. 
Given such a polynomial /， we may study the equation 

(1.4) f(x,y) = 0 

analytically, using it to define y “implicitly” as a function y (x) of jc as we learn to do 
in calculus. In our -example, the function defined in this way is = Vjc 3 — x. This 
function isn’t single valued; it is determined only up to sign, but that isn’t a serious 
difficulty. We won’t have an explicit expression for such a function in general，but 
by definition，it satisfies the equation (1.4), that is, 

(1.5) f(x, y W) = 0. 

On the other hand，the equation can also be studied algebraically. Let us inter¬ 
pret/^,}/) as a polynomial in y whose coefficients are polynomials in jc. Let F de¬ 
note the field C(jc) of rational functions in jc. If/is not a polynomial in jc alone, then 
since it is irreducible in C[x,y], it will be an irreducible element of F[y] [Chapter 
11 (3.9)], Therefore the ideal generated by/in F[y] is maximal [Chapter 11 (1.6 )]， 
and F[y]/(f) = 尺 is an extension field of F, 

The analysis and the algebra are related, because both the implicitly defined 
function y(x) and the residue y of in F[y]/(f) satisfy the equation/(^, y) = 0. In 
this way, the residue of y，and indeed all elements of K, can be interpreted as func¬ 
tions of the variable jc. Because of this，such fields are called function fields. We will 
discuss function fields in Section 7. 


2. ALGEBRAIC A1SD TRANSCENDENTAL ELEMENTS 

Let K be an extension of a field F, and let a be an element of K. In analogy with the 
definition of algebraic numbers (Chapter 10， Section 1) ? a is said to be algebraic 
over F if it is the root of some nonzero polynomial with coefficients in F. Since the 
coefficients are from a field，we may assume that the polynomial is monic, say 

( 2 . 1 ) x n + a n -\X n ~ l + ••• 十 a\X + no, with a\ G F. 

An element a is called transcendental over F if it is not algebraic over F，that is，if 
it is not a root of any such polynomial. 

Note that the two properties，algebraic and transcendental，depend on the 
given field F, For example ? the complex number IttI is algebraic over the field of 
real numbers but transcendental over the field of rational numbers. Also, every ele¬ 
ment a of a field K is algebraic over AT, because it is the root of the polynomial 
x — a, which has coefficients in K. 
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The two possibilities for a can be described in terms of the substitution 
homomorphism 

(2.2) <p: F[x] - >K ， which maps f(x )). 

The element a is transcendental over F if is injective and algebraic over F other¬ 
wise, that is, if the kernel of <p is not zero. 

Assume that a is algebraic over F. Since F[x] is a principal ideal domain, 
ker <p is generated by a single element f(x) , the monic polynomial of lowest degree 
having a as a root. Since ^ is a field, we know that/(;c) must be an irreducible poly¬ 
nomial [Chapter 11 (7.15)], and in fact it will be the only irreducible monic polyno¬ 
mial in the ideal. Every other element of the ideal is a multiple of/(jc). We will call 
this polynomial / the irreducible polynomial for a over F. 

It is important to note that this irreducible polynomial / depends on F as well 
as on a, because irreducibility of a polynomial depends on the field. For example, 

let F = Q[i], and let a be the complex number V7 = 士 V5(l + /)_ The irreducible 

polynomial for a over Q is 又 4 + 1 ， but this polynomial factors in the field 
F: x A + \ = (x 2 + /)(^ 2 — 0- The irreducible polynomial for a over F is x 2 — 
When there are several fields around，we must be careful to make it clear to which 
field we refer. To say that a polynomial is irreducible is ambiguous. It is better to say 
that / is irreducible over F, or that it is an irreducible element of F[x], 

The field extension of F which is generated by an element a E K will be de¬ 
noted by F(a): 

(23) F{a) is the smallest field containing F and a. 

More generally，if a u ...,a n are elements of an extension field K of F, then the 
notation F(ai,...,a n ) will stand for the smallest subfield K which contains these 
elements. 

As in Chapter 10, we denote the ring generated by a over F by F[a]. It con¬ 
sists of all elements of K which can be written as polynomials in a with coefficients 
inF: 

(2.4) a n a n + + a { a + a 0 , at E F. 

The field F(a) is isomorphic to the field of fractions of F[a], Its elements are ratios 
of elements of the form (2.4) [see Chapter 10 ( 6 , 7 )]_ 

(2.5) Proposition. If a is transcendental over T 7 , then the map F[x] - >F[a] 

is an isomorphism，and hence F(a) is isomorphic to the field F(x) of rational 
functions. □ 

This simple fact has the consequence that the field extensions F(a) are isomor¬ 
phic for all transcendental elements a, because they are all isomorphic to the field of 
rational functions F{x). For instance, tt and e are both transcendental over Q 
(though we have not proved that they are). Therefore Q{rr) and Q(e) are isomorphic 
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fields, the isomorphism carrying tt to e. This is rather surprising at first glance. The 
isomorphism is not continuous when the fields are regarded as subfields of the real 
numbers. 

The situation is quite different if a is algebraic: 

(2.6) Proposition* 

(a) Suppose that a is algebraic over F, and let f(x) be its irreducible polynomial 

over F. The map F[x]/(f) - >F[a] is an isomorphism, and F[a] is a field. 

Thus F[a] - F(a). 

(b) More generally, let ai ， .. ” be algebraic elements of a field extension K of F. 
Then F[a\^..,a n ] = 

Proof , Let <p be the map (2,2)，with K = F(a), Since f(x) generates ker <p, 
we know that F[x]/(f) is isomorphic to the image of <p [Chapter 10 (3.1)]，which is 
F[a]. Since / is irreducible，it generates a maximal ideal [Chapter 11 (1.6)], This 
shows that F[a] is a field. Since F(a) is isomorphic to the fraction field of F[a], it 
is equal to F[a]. We leave the proof of the second part as an exercise. □ 

(2.7) Proposition. Let a be an algebraic element over F, and \etf(x) be its irre¬ 
ducible polynomial. Suppose/( jc) has degree n. Then (l ， a，is a basis for 
F[a] as a vector space over F. 

Proof. This proposition is a special case of (5.7) in Chapter 10. □ 

It may not be easy to tell whether or not two algebraic elements a,/3 generate 
isomorphic fields, though we can use Proposition (2.7) to give a necessary condi¬ 
tion: Their irreducible polynomials over F must have the same degree, because this 
degree is the dimension of the field extension as an F-vector space. This is obviously 
not a sufficient condition. For example, all the imaginary quadratic fields studied in 
Chapter 11 are obtained by adjoining elements 8 whose irreducible polynomials 
x 2 — d have degree 2, but they aren’t all isomorphic. On the other hand, if a is a 
root of — x + 1, then = a 2 is a root of x 3 — lx 2 + x — l. The two fields 
Q(a) and Q(P) are actually equal，though if we were presented only with the two 
polynomials, it might take us some time to notice how they are related. 

What we can describe easily are the circumstances under which there is an 
isomorphism 

(2.8) F(a)-^F(p) 

which fixes F and sends ato p. The following proposition is fundamental to our un¬ 
derstanding of field extensions: 

(2.9) Proposition. Let a E. K and )3 G L be algebraic elements of two extension 
fields of F. There is an isomorphism of fields 

(r: F(a) - 
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which is the identity on the subfield F and which sends a a/vw ^/3 if and only if the 
irreducible polynomials for a and /3 over F are equal. 

Proof. Assume that /(jc) is the irreducible polynomial for a and for /3 over F. 
We apply Proposition (2.6)，obtaining two isomorphisms 

F[x]/(f)^F[a] and F[x]/(f)^UF[pl 

The composed map cr = is the required isomorphism. Conversely, if there is 
an isomorphism cr sending a to /3 which is the identity on F, and if/(x) G F [x] is a 
polynomial such that/(a) = 0 ? then /(/3) = 0 too [see Proposition (2.11)]- Hence 
the two elements have the same irreducible polynomial- □ 

(2.10) Definition. Let K and K f be two extensions of the same field 厂 An iso¬ 
morphism (p: K - >K f which restricts to the identity on the subfield F is called an 

isomorphism of field extensions, or an F-isomorphism. Two extensions K y K f of a 
field F are said to be isomorphic field extensions if there exists an F-isomorphism 
(p: K — 

(2.11) Proposition. Let <p ： K - 'be an isomorphism of field extensions of F, 

and let /(jc) be a polynomial with coefficients in F. Let a be a root of/in K, and let 
a ' = <p(a) be its image in K f . Then a' is also a root of/. 

Proof. Say that/Cx) = a n x n + … + a { x + fl 0 . Then (p(ai) = a t and <p(a)= 
a Since <p is a homomorphism，we can expand as follows: 

0 = 《⑼ = <p{f{a ))=+ … + flia + a 。） 

= (p(a n )(p(a) n + … + (p(a\)(p(a) + <p(ao) 

= a n a ,n + •_• + flia' + flo. 

This shows that a' is a root of /. □ 

For example, the polynomial x 3 — 2 is irreducible over Q. Let a denote the 
real cube root of 2, and let ( = e 27rl/3 be a complex cube root of 1 , The three com¬ 
plex roots of jc 3 — 2 are a, (a, and ( 2 a. Therefore there is an isomorphism 

(2.12) Q(a)—^Q(fa) 

sending a to (a _ In this case the elements of Q(a) are all real numbers, but Q((a) is 
not a subfield of U. To understand the isomorphism (2,12)，we must stop viewing 
these fields as subfields of C and look only at their internal algebraic structure. 

3. THE DEGREE OF A FIELD EXTENSION 

An extension A" of a field F can always be regarded as an F-vector space. Addition is 
the addition law in K, and scalar multiplication of an element a of 尺 by an element 
c of F is defined to be the product ca formed by multiplying these two elements in 
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K. The dimension of K as an F-vector space is called the degree of the field exten¬ 
sion F C K. The degree is the simplest invariant of an extension，but though simple, 
it is important. It will be denoted by 

(3.1) [K : F] = dimension of K, as an F-vector space• 

For example, C has the (R-basis ( 1 ，/)，so [C : R] = 2 . 

A field extension F C K \s called a finite extension if its degree [K : F] is 
finite. Extensions of degree 2 are also called quadratic extensions，those of degree 3 
are called cubic extensions, and so on. The degree of an extension F C is 1 if and 
only if F = K. 

The term degree comes from the case that K = F(a) is generated by one alge¬ 
braic element a. In that case ， 尺 has the basis (l ， a，where n is the degree 
of the irreducible polynomial for a over F [Proposition (2.7)]. Thus we find the first 
important property of the degree: 

(3.2) Proposition. If a is algebraic over F, then [F(a) : F] is the degree of the ir¬ 
reducible polynomial for a over F. □ 

This degree is also called the degree of a over F. Note that an element a has degree 
1 over F if and only if it is an element of F, and a has degree 00 if and only if it is 
transcendental over F. 

Extensions of degree 2 are easy to describe. 

(3.3) Proposition. Assume that the field F does not have characteristic 2, that is ， 
that 1 + 1 关 0 in 厂 Then any extension F C K of degree 2 can be obtained by ad¬ 
joining a square root: K = F(8), where 8 2 = D is an element of F. Conversely, if 5 
is an element of an extension of F, and if 5 2 G F but 8 F, then F(8) is a 
quadratic extension. 

Proof. We first show that every quadratic extension is obtained by adjoining a 
root of a quadratic polynomial/( jc) E F[x]^ To do this，we choose any element a of 
K which is not in F. Then (l ， a) is a linearly independent set over F. Since K has 
dimension 2 as a vector space over F, (l,a) is a basis for K over F, and K = F[a]. 
It follows that a 2 is a linear combination of (l ， a)，say a 2 = -ba - c, with 

b,c E F. Then a is a root of/(x) = x 2 + bx + c. _ 

Since 2 关 0 in F，we can use the quadratic formula a = \{~b + Vb^-Ac) to 
solve the equation x 2 + bx + c = 0. This is proved by direct calculation. There are 
two choices for the square root，one of which gives our chosen root a. Let 8 de¬ 
note that choice: 8 = V / b^—Ac = 2a + b. Then 8 is in K, and it also generates K 
over F. Its square is the discriminant b 2 — 4c, which is in F. 

The last assertion of the proposition is clear. □ 

The second important property of the degree is that it is multiplicative in 
towers of fields. 

(3*4) Theorem^ Let F C K C Lbc fields. Then [L : F] = [L : K][K : F]. 
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Proof. Let B = ( 乃， … ，； y") be a basis for L as a K - vector space，and let 
C = be a basis for K as an F-vector space. So [L : K] = n and 

[A" : F] = m. We will show that the set of mn products P = (..•，is a basis 
of L as an F-vector space，and this will prove the proposition. The same reasoning 
will work if B or C is infinite. 

Let a be an element of L. Since B is a basis for L over K, we can write 
a = /3iji + ••• + p n y n ， with pj E in a unique way. Since C is a basis for K 
over F, each /3, can be expressed uniquely, as ft = aipci + … + a mj x m 5 with 
dj E F. Thus a = ^ija^xtyj . This shows that P spans L as an Z 7 -vector space. We 
know that Pj is uniquely determined by a ， and since B is a basis for K over F, the 
elements fly are uniquely determined by /3；. So they are uniquely determined by a. 
This shows that P is linearly independent, and hence that it is a basis for L over F. □ 

One important case of a tower of field extensions is that A" is a given extension 
of F and a is an element of K. Then the field F(a) generated by a is an intermediate 
field: 

(3.5) F C F(a) C K. 

(3.6) Corollary. Let K be an extension of F, of finite degree n. Let a be an ele- 
ment of K. Then a is algebraic over F, and its degree divides n. 

To see this，we apply Theorem (3.4) to the fields F C F(a) C K and use the fact 
that the degree of a over F is [F(a) : F] if a is algebraic，while [F(a) : F] ~ if a 
is transcendental. □ 

Here are some sample applications: 

(3.7) Corollary • Let 足 be a field extension of F of prime degree /?, and let a be an 
element of K which is not in F. Then a has degree p over F, and K = F(a). 

For，/? = : F] = [A": / r (a)][/ r (a) : F\ One of the terms on the right side is 1, 

Since a ^ it is not the second term, so [K : F(a)] = 1 and [F(a) : F] = p. 
Therefore K = F(a). □ 

(3.8) Corollary. Every irreducible polynomial in U[x] has degree 1 or 2 . 

We proved this in Chapter 11 ， Section 1 ， but let us derive it once more: Let g be an 
irreducible real polynomial* Then g has a root a in C. Since [C : [R] = 2 , the de¬ 
gree of a over U divides 2, by (3.6). Therefore the degree of ^ is 1 or 2. □ 

(3.9) Examples• 

(a) Let a = /3 = Consider the field L = Q(a,/3) obtained by adjoin¬ 

ing a and /3 to Q, Then [L : Q] = 12. For L contains the subfield Q(a) ? 
which has degree 3 over Q, because the irreducible polynomial for a over Q is 
x 3 — 2. Therefore 3 divides [L : Q] ‘ Similarly, L contains Q(/3) and /3 has de- 
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gree 4 over <□，so 4 divides [L : Q]. On the other hand, the degree of /3 over 
the field Q(a) is at most 4, because /3 is a root of x 4 — 5, and this polynomial 
has coefficients in Q(a). The chain of fields L = Q(a,/3) D Q(a) D Q shows 
that [L : Q] is at most 12. So [L : Q] = 12. 

(b) It follows by reducing modulo 2 that the polynomial f(x) = jc 4 + 2x 3 + 

6x 2 + x + 9 is irreducible over Q [Chapter 11 (4.3)]. Let y be a root of f(x). 
Then there is no way to express a = v2 rationally in terms of y, that is, 
a ^ Q(y). For [Q(a) : Q] = 3, [Q(y) : <Q] = 4, and 3 does not divide 4. So 
we can’t have O(y) > 0(a). On the other hand，since i has degree 2 over Q, 
it is not so easy to decide whether i is in Q(y). (In fact, it is not.) 口 

The next two theorems state the most important abstract consequences of the 
multiplicative property of degrees. 

(3.10) Theorem. Let K be an extension of F. The elements of K which are alge¬ 
braic over F form a subfield of K. 

Proof • Let a，/3 be algebraic elements of K. We must show that a + /3, a/3, 
-a, and a ' 1 (if a ^ 0) are algebraic too. We note that since a is algebraic ， 
[F(a) : F] < Moreover, /3 is algebraic over F, and hence it is also algebraic over 
the bigger field F(a). Therefore the field F(a ， /3 )，which is generated over F(a) by 
/3, is a finite extension of F(a), that is ， [F(a,/3) : F(a)] < By Theorem (3.4), 
[F(a,p) : F] is finite too. Therefore every element of F(a,/3) is algebraic over F 
(3*6). The elements a + J3, a/3, etc. all lie in F(a,/3), so they are algebraic. This 
proves that the algebraic elements form a field, □ 

Suppose for example that a = Va, /3 = V^，where a,b E F. Let us deter¬ 
mine a polynomial having y = a + /3 as a root. To do this，we compute the powers 
of y，and we use the relations a 2 = a, p 2 = b to simplify when possible. Then we 
look for a linear relation among the powers: 

y 2 = a 2 + 2a/3 + /3 2 = {a+b) + 2a/3 

y 4 = (a+b) 2 + 4(a-\-b)ap + 4a 2 /3 2 = (a 2 +6ab+b 2 ) + 4(a+b)aP. 

We won’t need the other powers because we can eliminate a/3 from these two equa¬ 
tions to obtain the equation y 4 — 2(a+b)y 2 + {a—b) 2 = 0* Thus y is a root of the 
polynomial 

g(x) = x 4 — 2(a+b)x 2 + (a—b) 2 , 

which has coefficients in F, as required- 

This method of undetermined coefficients will always produce a polynomial 
having an element such as a + /3 as a root，if the irreducible polynomials for a and 
/3 are known. Suppose that the degrees of two elements a，/3 are di ， d 2 ， and let 
n = d\d 2 - Any element of F(a,/3) is a linear combination，with coefficients in F, of 
the n monomials 0 < / < di, 0 < j < d 2 . This is because F(a,/3) = 

F[a ， /3] (2.6), and these monomials span F[a,/3]. Given an element y E F(a ， p 、， 
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we write the powers 1 ， y ， y 2 ，…， yas linear combinations of these monomials, with 
coefficients in F. Since there an « + 1 of the powers y v and only n monomials a l /5 j , 
the powers are linearly dependent. A linear dependence relation determines a poly¬ 
nomial with coefficients in F of which y is a root. 

But there is one point which complicates matters. Let g(x) be the polynomial 
having y as a root which we find in this way. This polynomial may be reducible. For 
instance，it may happen that y is actually in the field F, though a，/3 aren’t in 厂 If 
so, the method we described is unlikely to produce its irreducible equation jc — y. It 
is harder to determine the irreducible polynomial for y over F. □ 

An extension AT of a field F is called an algebraic extension, and K is said to be 
algebraic over F, if all its elements are algebraic. 

(3.11) Theorem, Let F C K C Lbe ： fields. If L is algebraic over K and K is al¬ 
gebraic over F, then L is algebraic over F. 

Proof. We need to show that every element a E L is algebraic over F. We 
are given that a is algebraic over K, hence that some equation of the form 

a n + a n -\a n ~ l + … + aia + = 0 

holds, with G K. Therefore a is algebraic over the field 

generated by ao,.,^a n -i over F. Note that each coefficient a, ， being in K, is alge¬ 
braic over F. We consider the chain of fields 

F C F(a 0 ) C F(a 0 ,a\) C C F(ao,au.,.,a n -\) C F{ao,a\,^.,a n -\,a) 

obtained by adjoining the elements a 。， … ，知 ― i，a in succession. For each i, a;+i is 
algebraic over F(ao, …， ai) because it is algebraic over F. Also，a is algebraic over 
F(ao,ai,...,a n -i) t So each extension in the chain is finite. By Theorem (3.4), the 
degree of F(a 0 ,a\^.^a n -\,ot) over F is finite. Therefore by Corollary (3.6) a is al¬ 
gebraic over F. □ 


4. CONSTRUCTIONS WITH RULER AND COMPASS 

There are famous theorems which assert that certain geometric constructions，such 
as trisection of an angle, can not be done with ruler and compass alone. We will 
now use the concept of degree of a field extension to prove some of them. 

Here are the rules for basic ruler and compass construction; 


(4.1) 

(a) Two points in the plane are given to start with. These points are considered to 
be constructed. 

(b) If two points have been constructed，we may draw the line through them, or 
draw a circle with center at one point and passing through the other. Such lines 
and circles are then considered to be constructed. 
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(c) The points of intersection of lines and circles which have been constructed are 
considered to be constructed. 

Note that our ruler may be used only to draw straight lines through constructed 
points. We are not allowed to use it for measurement. Sometimes it is referred to as 
a “straight-edge” to make this point clear. 

We will describe all possible constructions, beginning with some familiar ones. 
In each figure，the lines and circles are to be drawn in the order indicated. 

(4-2) Construction. Draw a line through a constructed point p and perpendicular 
to a constructed line £. 

Case 1: p ^ i 



This construction works with any point q E £ which is not on the perpendicu¬ 
lar. However, we had better not choose points arbitrarily，because if we do we’ll 
have difficulty keeping track of which points we have constructed and which ones are 
merely artifacts of an arbitrary choice. Whenever we want an arbitrary point, we 
will construct a particular one for the purpose. 

Case 2; /? E € 
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(4,3) Construction. Draw a line parallel to i and passing through p. Apply Cases 
1 and 2 above: 


2 




(4.4) Construction. Mark off a length defined by two points onto a constructed 
line starting at a constructed point p E £. Use construction of parallels. 



These constructions allow us to introduce Cartesian coordinates into the plane 
so that the two points which are given to us to start have coordinates (0, 0) and 
(0,1). Other choices of coordinate systems could be used，but they lead to equivalent 
theories. 


: y 


• - x 


We will call a real number a constructible if its absolute value a is the dis¬ 
tance between two constructible points，the unit length being the distance between 
the points given originally. 


(4,5) Proposition. A point p = (a, b) is constructible if and only if its Cartesian 
coordinates a and b are constructible numbers. 







Section 4 Constructions with Ruler and Compass 


503 


Proof • This follows from the above constructions* Given a point p，we can 
construct its coordinates by dropping perpendiculars to the axes. Conversely, if a 
and b are given constructible numbers, then we can construct the point p by marking 
a, b off on the two axes using (4-4) and erecting perpendiculars •口 

(4.6) Proposition. The constructible numbers form a subfield of [R. 

Proof ， We will show that if a and b are positive constructible numbers, then 
a + b, ab y a — b，（if a > b )，and a~ l (if a 关 0) are also constructible. The closure 
in case a or 办 is negative follows easily. 

Addition and subtraction are done by marking lengths on a line, using Con¬ 
struction (4.4), 

For multiplication，we use similar right triangles: 



Given one triangle and one side of a second triangle, the second triangle can be con¬ 
structed by parallels. 

To construct the product ab ，we take r = 1, 5 = a, and r f = b. Then since 
r/s = r f /s\ it follows that s f - ab. To construct a~\ we take r — a, s = 1 ? and 
r r = 1. Then s 9 = a~\ □ 

(4.7) Proposition • If a is a positive constructible number, then so is Va, 

Proof • We use similar triangles again. We must construct them so that r = a ， 
r f = s, and V = 1. Then s = r f — Va. 

How to make the construction is less obvious this time, but we can use in¬ 
scribed triangles in a circle. A triangle inscribed into a circle，with a diameter as its 
hypotenuse, is a right triangle. This is a theorem of high school geometry. It can be 
checked using the equation for a circle and Pythagoras’s theorem. So we draw a cir¬ 
cle whose diameter is 1 + a and proceed as in the figure below. Note that the large 
triangle is divided into two similar triangles. 



(4.8) Proposition. Suppose four points are given，whose coordinates are in a 
subfield F of U. Let a/b be lines or circles drawn using the given points. Then the 
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points of intersection of A and B have coordinates of F, or in a field of the form 
F(Vr), where r is a positive number in F. 

Proof. The line through (flo.^o), («i ? has the linear equation 

(ax - a 0 )(y - bo) = (b\ - b 0 )(x - a 0 ). 

The circle with center (a 0? ^ 0 ) and passing through (aub { ) has the quadratic equa¬ 
tion 

(x - a 0 ) 2 + (y ~ b 0 ) 2 = (fli - a 0 ) 2 + (bi - b 0 ) 2 . 

The intersection of two lines can be found by solving two linear equations whose 
coefficients are in F. So its coordinates are in F too. To find the intersection of a line 
and a circle，we use the equation of the line to eliminate one variable from the equa¬ 
tion of the circle, obtaining a quadratic equation in one unknown. This quadratic 
equation has solutions in the field F(Vd) ? where D is the discriminant，which is an 
element of F. If D < 0, the line and circle do not intersect. 

Consider the intersection of two circles, say 

(x - ai) 2 + {y - bi) 2 = ri 2 and (jr - a 2 ) 2 + (y - b 2 ) 2 = r 2 2 , 

where a,, bi , n E F. In general, the solution of a pair of quadratic equations in two 
variables requires solving an equation of degree 4. In this case we are lucky; The 
difference of the two quadratic equations is a linear equation which we can use to 
eliminate one variable，as before. □ 

(4.9) Theorem. Let be constructible real numbers. There is a chain of 

subfields Q = F 0 C F\ C F 2 C C F n = K such that 

(i) ^ is a subfield of R; 

(ii) d\ ? , dm G K\ 

(iii) for each / = 0,…， n _ 1， the field 朽 +i is obtained from Ft by adjoining the 
square root of a positive number n E F, , which is not a square in Ft . 

Conversely, let Q = F 0 C Fi C • • * C be a chain of subfields of R which 

satisfies (iii) . Then every element of K is constructible. 

Proof. We introduced coordinates so that the points originally given have co¬ 
ordinates in Q. The process of constructing the numbers a ； involves drawing lines 
and circles and taking their intersections. So the first assertion follows by induction 
from Proposition (4.8). Conversely，if such a tower of fields is given, then its ele¬ 
ments are constructible，by Propositions (4.6) and (4.7). □ 

(4.10) Corollary* If a is a constructible real number, then it is algebraic, and its 
degree over Q is a power of 2. 

For, in the chain of fields (4.9)，the degree of F^i over F t is 2, and hence 
[A": Q] = 2 n . Corollary (3.6) tells us that the degree of a divides 2' hence that it is 
a power of 2. □ 
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The converse of Corollary (4.10) is false. There exist real numbers a which have de¬ 
gree 4 over Q but which are not construetible. We will be able to prove this later, 
using Galois theory. 

We can now prove the impossibility of certain geometric constructions. Our 
method will be to show that if a certain construction were possible, then it would 
also be possible to construct an algebraic number whose degree over Q is not a 
power of 2. This would contradict (4.10). 

Let us discuss trisection of the angle as the first example. We must pose the 
problem carefully, because many angles, 45° for instance, can be trisected. The cus¬ 
tomary way to state the problem is to ask for a single method of construction which 
will work for any given angle. 

To be as specific as possible, let us say that an angle 6 is constructible if its 
cosine cos 0 is constructible. Other equivalent definitions are possible，For example, 
with this definition, 0 is constructible if and only if the line which passes through the 
origin and meets the jc- axis in the angle 6 is constructible. Or ，d is constructible if 
and only if it is possible to construct any two lines meeting in an angle 0. 

Now just giving an angle 6 (say by marking off its cosine on the jc-axis) pro¬ 
vides us with new information which may be used in a hypothetical trisection. To 
analyze the consequences of this new information, we should start over and deter¬ 
mine all constructions which can be made when, in addition to two points, one more 
length (= cos 0) is given at the start. We would rather not take the time to do this ， 
and there is a way out. We will exhibit a particular angle 6 with these properties: 

(4.11) (i) 0 is constructible，and 
(ii) is not constructible. 

The first condition tells us that being given the angle 9 provides no new information 
for us: If the angle 6 can be trisected when given，it can also be trisected without be¬ 
ing given. The second condition tells us that there is no general method of trisec¬ 
tion, because there is no way to trisect 9. 

The angle 0 = 60° does the job. A 60° angle is constructible because 
cos 60° = On the other hand，it is impossible to construct a 20° angle. To show 
this, we will show that cos 20。 is an algebraic number of degree 3 over Q. Then 
Corollary (4.10) will show that cos 20° is not constructible，hence that 60° can not 
be trisected. 

The addition formulas for sine and cosine can be used to prove the identity 

(4.12) cos 30 = 4 cos 3 0 — 3 cos 6. 

Setting 6 = 20° and a = cos 20。， we obtain the equation \ — 4a 3 — 3a，or 
8a 3 - 6a - 1 = 0. 

(4.13) Lemma* The polynomial/(x) = 8x 3 — 6x — 1 is irreducible over Q. 

Proof. It is enough to check for linear fectors ax + b, where a, b are integers 
such that a divides 8, and b = ±L Another way to prove irreducibility is to check 
that/has no root modulo 5. □ 
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This lemma tells us that a has degree 3 over Q, hence that it can not be constructed 
As another example, let us show that the regular 7-gon can not be constructed. 
This is similar to the above problem: The construction of 20° is equivalent to the 
construction of the 18-gon. Let 6 denote the angle 2tt/7 and let 
^ = cos 0 + i sin 0. Then f is a root of the equation jc 6 + x 5 + ••• + 1 = 0, 
which is irreducible [Chapter 11 (4,6)]. Hence ( has degree 6 over Q. If the 7-gon 
were construetible, then cos 6 and sin 6 would be constructible numbers, and hence 
they would lie in a real field extension of degree 2 n over Q, by Theorem (4.9). Call 
this field K, and consider the extension K(i). This extension has degree 2. Therefore 
[K(i) : Q] = 2 /z+1 . But ( = cos 0 ^ i sin 0 G K(i). This contradicts the fact that 
the degree of f is 6 (3.6). 

Notice that this argument is not special to the number 7. It applies to any 
prime integer p, provided only that p — 1 ， the degree of the irreducible polynomial 
1 + … + jc + 1， is not a power of 2. 

(4.14) Corollary. Let p be a prime integer. If the regular p-gon can be con¬ 
structed by ruler and compass, then p = 2 r + 1 for some integer r. □ 

Gauss proved the converse: If a prime has the form 2 r + 1, then the regular p-gon 
can be constructed The regular 17-gon ? for example, can be constructed with ruler 
and compass. We will learn how to prove this in the next chapter. 


5. SYMBOUC ADJUNCTION OF ROOTS 


Up to this point，we have used subfields of the complex numbers as our examples. 
Abstract constructions are not needed to create these fields (except that the construc¬ 
tion of C from U is abstract)* We simply adjoin complex numbers to the rational 
numbers as desired and work with the subfield they generate. But finite fields and 
function fields are not subfields of a familiar, all-encompassing field analogous to C, 
so these fields must be constructed. The fundamental tool for their construction is 
the adjunction of elements to a ring, which we studied in Section 5 of Chapter 10. It 
is applied here to the case that the ring we start with is a field F. 

Let us review this construction. Given a polynomial /(jc) with coefficients in F, 
we may adjoin an element a satisfying the polynomial equation/(a) = 0 to F. The 
abstract procedure is to form the polynomial ring F[x] and then take the quotient 
ring 

(5.1) R f = F[x]/(f). 

This construction always yields a ring R f and a homomorphism F -， such that 

the residue x of x satisfies the relation/(x) = 0. 

However, we want to construct not only a ring, but a field, and here the theory 
of polynomials over a field comes into play. Namely，that theory tells us that the 
principal ideal (/) is a maximal ideal if and only if / is irreducible [Chapter 11 
(L6)], Therefore the ring R f will be a field if and only if/is an irreducible polyno¬ 
mial. 
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(5.2) Lemma. Let F be a field，and let/be an irreducible polynomial in F[x]. 
Then the ring K = F[x]/(f) is an extension field of F, and the residue x of x is a 
root off(x) in K. 

Proof • The ring AT is a field because (/) is a maximal ideal. Also, the homo¬ 
morphism F - > K ， which sends the elements of F to the residues of the constant 

polynomials, is injective, because F is a field. So we may identify F with its image, 
a subfield of K. The field K becomes an extension of F by means of this iden¬ 
tification. Finally, x satisfies the equation /(jc) = 0, which means that it is a root 
of/. □ 

(5.3) Proposition. Let F be a field，and let/(x) be a monic polynomial in F[x] of 
positive degree. There exists a field extension K of F such that/(x) factors into linear 
factors over K. 

Proof. We use induction on the degree of/ The first case is that/has a root a 
in F，so that/(jc) = (jc — a)g(x) for some polynomial g. If so, we replace/by g 9 
and we are done by induction. Otherwise, we choose an irreducible factor g(x) of 
f{x). By Lemma (5.2)，there is a field extension of F, call it K ， in which g(x) has a 
root a . We replace F by Fi and are thereby reduced to the first case. □ 

As we have seen，the polynomial ring F[jc] is an important tool for studying 
extensions of a field F. When we are working with two fields at the same time, there 
is an interplay between their polynomial rings. This interplay doesn’t present serious 
difficulties, but instead of scattering the points which need to be mentioned about in 
the text, we have collected them here. 

Notice that if K is an extension field of F，then the polynomial ring K[x] con¬ 
tains F[x] as subring. So computations which are made in the ring F[x] are also 
valid in K[x]. 

(5 A) Proposition, Let / and g be polynomials with coefficients in a field F, and 
let K be an extension field of F. 

(a) Division with remainder of g by / gives the same answer，whether carried out 
in F[x] or in K[x]. 

(b) / divides g in K[x] if and only if / divides g in F[jc]. 

(c) The monic greatest common divisor d of / and g is the same，whether com¬ 
puted in F[x] or in K[x]. 

(d) If/and g have a common root in AT，then they are not relatively prime in F[x]. 
Conversely ， if / and g are not relatively prime in F[x], then there exists an ex¬ 
tension field L in which they have a common root. 

(e) If/is irreducible in F[x] and if / and g have a common root in K, then / di¬ 
vides g in F[x]. 

Proof, (a) Carry out the division in F[x] ^ g — fq + r. This equation also 
holds in the bigger ring K[x], and further division of the remainder by/is not possi¬ 
ble, because r has lower degree than/，or else it is zero. 
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(b) This is the case that the remainder is zero in (a )， 

(c) Let d，(T denote the monic greatest common divisors of / and g in F[x] and in 
K[x]. Then d is also a common divisor in K[x]. So d divides d f in K[x], by 
definition of In addition, we know that d has the form d = pf + qg ， for some 
elements p,q E F[x], Since d f divides/and g, it divides pf + qg = d too. Thus d 
and d f are associates in K[x], and，being monic, they are equal. 

(d) Let a be a common root of / and g in K. Then x - a is a common divisor of / 
and g in K[x], So their greatest common divisor in K[x] is not 1, By (c)，it is not 1 
in F[x] either * Conversely, if / and g have a common divisor d of degree > 0, then 
by (5.3), d has a root in some extension field L. This root will be a common root of 
/andg. 

(e) If / is irreducible, then its only divisors in F[x] are 1，/， and their associates. 
Part (d) tells us that the greatest common divisor of / and g in F[x] is not L There¬ 
fore it is/，□ 

The final topic of this section concerns the derivative/'( jc) of a polynomial 
f(x). In algebra, the derivative is computed using the rules from calculus for differ¬ 
entiating polynomial functions. In other words，we define the derivative of x n to be 
the polynomial nx n ~\ and iff(x) = a n x n + a n ^\x n ~ l + ••• + a\X + ao, then 

(5.5) f r (x) = na n x n ~ l + (n—l)a n -\x n ~ 2 + ••• + a\. 

The integer coefficients in this formula are to be interpreted as elements of F by 

means of the homomorphism Z - >F [Chapter 10 (3,18 )]， So the derivative is a 

polynomial with coefficients in the same field. It can be shown that rules such as the 
product rule for differentiation hold. 

Though differentiation is an algebraic procedure, there is no a priori reason to 
suppose that it has much algebraic significance; however, it does. For us, the most 
important property of the derivative is that it can be used to recognize multiple roots 
of a polynomial • 


(5.6) Lemma. Let F be a field, let/(jc) E F[jc] be a polynomial，and let a E F 
be a root of f(x )* Then a is a multiple root, meaning that (x — a) 2 divides/(x), if 
and only if it is a root of both/(:c) mdf f (x). 

Proof. If a is a root of/，then x — a divides /: f(x) = (x — a)g(x). Then a 
is a root of g if and only if it is a multiple root of /. By the product rule for differen¬ 
tiation, 


f f M = (x - a)g f (x) + g(x). 


Substituting x = a shows that/'(a) = 0 if and only if g(a) = 0. □ 


(5.7) Proposition. Lttf(x) E F[x] be a polynomial. There exists a field exten¬ 
sion AT of F in which / has a multiple root if and only if / and/ ; are not relatively 
prime. 
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Proof • If/has a multiple root in K, then/and/ / have a common root in K by 
Lemma (5.6), and so they are not relatively prime in K. Hence they are not rela¬ 
tively prime in F either. Conversely ， if/and/' are not relatively prime, then they 
have a common root in some field extension K, hence / has a multiple root there. □ 

Here is one of the most important applications of the derivative to field theory: 

(5.8) Proposition. Let/be an irreducible polynomial in F[x]. Then/has no mul¬ 
tiple root in any field extension of F unless the derivative/' is the zero polynomial. 
In particular, if F is a field of characteristic zero, then/has no multiple root. 

Proof. By the previous proposition, we must show that/and/' are relatively 
prime unless/' is the zero polynomial. Since/is irreducible, the only way that it can 
have a nonconstant factor in common with another polynomial g is for/to divide g 
(5Ae). And if / divides g, then deg g > deg /， or else ^ = 0. Now the degree of 
the derivative/' is less than the degree off. So/and/' have no nonconstant factor 
in common unless/' = 0, as required. In a field of characteristic zero, the deriva¬ 
tive of a nonconstant polynomial is not zero. □ 

The derivative of a nonconstant polynomial f(x) may be identically zero if F 
has prime characteristic p. This happens when the exponent of every monomial oc¬ 
curring in/is divisible by p, A typical polynomial whose derivative is zero in char¬ 
acteristic 5 is 

f(x) = jc 15 + ax 10 + bx 5 + c, 

where a ， b，c can be arbitrary elements of F. Since the derivative of this polynomial 
is identically zero, its roots in any extension field are all multiple roots. Whether or 
not this polynomial is irreducible depends on F and on a,b,c. 


6. FE^TTE FIELDS 

In this section, we describe all fields having finitely many elements. We remarked in 
Section 1 that a finite field K contains one of the prime fields and of course since 
K is finite, it will be finite-dimensional when considered as a vector space over this 
field. Let us denote ¥ p by F, and let r denote the degree [K : F]. As an F-vector 
space, K is isomorphic to the space F r ， and this space contains p r elements. So the 
order of a finite field is always a power of a prime. It is customary to use the letter q 
for this number; 

(6.1) q = p r ^ \ K . 

When referring to finite fields, p will always denote a prime integer and q a power of 
p ， the number of elements，or order, of the field. 

Fields with q elements are often denoted by ¥ q . We are going to show that all 
fields with the same number of elements are isomorphic, so this notation is not too 
ambiguous. However, the isomorphism will not be unique when r > 1. 
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The simplest example of a finite field other than the prime field ¥ p is the field 
^ = F 4 of order 4. There is a unique irreducible polynomial f(x) of degree 2 in 
F 2 [x] 5 namely 

(6.2) f(x) = j: 2 + jc + 1 

[see Chapter 11 (4.3)]，and the field K is obtained by adjoining a root a of/(;c) to 
F = F 2 : 

K ^ F[x]/(x 2 + jc + 1). 

The order of this field is 4 because a has degree 2, which tells us that K has dimen¬ 
sion 2 as a vector space over the field F. 

The set (1 ， a) forms a basis of K over F, so the elements of K are the four lin¬ 
ear combinations of these two elements，with mod-2 coefficients 0,1、 They are 

(6.3) {0, l ， a, 1 + a} = F 4 . 

The element 1 + a is the second root of the polynomial/(x) in K. Computation in K 
is made using the relations 1 + 1=0 and a 2 + a + 1 = 0 - Do not confuse the 
field with the ring Z / ⑷！ 

Here are the main facts about finite fields: 

(6.4) Theorem. Let p be a prime, and let q = p r be a power of p, with r > L 

(a) There exists a field of order q. 

(b) Any two fields of order q are isomorphic, 

(c) Let ^ be a field of order q. The multiplicative group K x of nonzero elements 
of AT is a cyclic group of order q — L 

(d) The elements of K are roots of the polynomial x q — x. This polynomial has 
distinct roots, and it factors into linear factors in K. 

(e) Every irreducible polynomial of degree r in F p [jc] is a factor of x q — x. The ir¬ 
reducible factors of x q — ^ in F p [jc] are precisely the irreducible polynomials in 
^ P [x] whose degree divides r. 

(f) A field K of order q contains a subfield of order q f = p k if and only if k di¬ 
vides r. 

The proof of this theorem is not very difficult，but since there are several 
parts, it will take some time. To motivate it, we will look at a few consequences 
first. 

The striking aspect of (c) is that all nonzero elements of K can be listed as 
powers of a single suitably chosen one. This is not obvious ， even for the prime field 
f p . For example, the residue of 3 is a generator of F 7 X . Its powers 3 0 ,3 J 3 2 , … list 
the nonzero elements of F 7 in the following order: 

(6-5) F 7 X = {1 ， 3, 2, 6,4, 5}. 
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As another example, 2 is a generator of Fn x , and its powers list that group in the 
order 

(6.6) F n x = {1,2,4,8,5,10,9,7,3,6}. 

A generator for the cyclic group f p x is called a primitive element modulo p. 
Note that the theorem does not tell us how to find a primitive element, only that one 
exists. Which residues modulo p are primitive elements is not well understood, but 
given a small prime p, we can find one by trial and error. 

We now have two ways of listing the nonzero elements of f p , additively and 
multiplicatively: 

(6.7) F p x = {1,2,3,...,/? - 1} = {\,v,v 2 ,...,v p ~ 2 } 9 

where v is a primitive element modulo p. Depending on the context，one or the other 
list may be the best for computation. 

Of course, the additive group F p + of the prime field is always a cyclic group of 
order p. Both the additive and multiplicative structures of the prime field are very 
simple: They are cyclic. But the field structure of F p ，governed by the distributive 
law, fits the two together in a subtle way. 

Part (e) of the theorem is also striking. It is the basis for many methods of fac¬ 
toring polynomials modulo p. Let us look at a few cases in which ^ is a power of 2 
as examples: 

(6.8) Examples* 

(a) The elements of the field F 4 are the roots of the polynomial 

(6.9) x 4 — x = x(x — l)(x 2 + + 1). 

In this case, the irreducible factors of jc 4 — jc in Z[x] happen to remain irreducible in 
Note that the factors of jc 2 — jc appear here, because F 4 contains F 2 . 

Since we are working in characteristic 2， the signs are irrelevant: 
x — 1 = x + l. 

(b) The field F 8 of order 8 has degree 3 over the prime field F 2 . Its elements are the 
eight roots of the polynomial 

(6*10) jc 8 — jc = jc(jc — l)0c 3 + jc + 1)(jc 3 + jc 2 + 1)，in Fa[jc]. 

So the six elements in F 8 which aren't in F 2 fall into two classes: the three roots of 
x 3 + x + 1 and the three roots of jc 3 + x 2 + L 

The cubic factors of (6.10) are the two irreducible cubic polynomials of degree 
3 in F 2 [jc] [see Chapter 11 (4.3)]. Notice that the irreducible factorization of this 
polynomial in the ring of integers is 

(6.11) jc 8 — jc = jc(jc — \)(x 6 + jc 5 + … + x + 1)，in Z[x]. 

The third factor is reducible modulo 2. 

To compute in the field F 8 ，choose a root /3 of one of the cubics, say of 
x 3 + jc + 1. Then (l ， /3 ， /3 2 ) is a basis of F 8 as a vector space over The elements 
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of F 8 are the eight linear combinations with coefficients 0, 1: 

(6.12) + + (3 2 ,(3 + p 2 ，l + p + (3 2 V 

Computation in IF s is done using the relation |3 3 + + 1 二 0, 

Note that F 4 is not contained in F 8 . It couldn’t be，because [F 8 : F 2 ] = 3, 
[F 4 : F 2 ] = 2, and 2 does not divide 3. 

(c) The field F i6: The polynomial jc 16 — jc = a:(jc 15 — 1) is divisible in Z[x] by 
x 3 — 1 and by jc 5 _ 1. Carrying out the division over the integers gives this factor¬ 
ization: 

(6.13) x 16 — x — 

x(x — l)(x 2 + + \)(x 4 + ;c 3 + ;c 2 + jc + 1)(jc 8 — x 1 + x 5 — x A + — x + 1). 

This is the irreducible factorization in Z[jc], But in F 2 [jc] ，the factor of degree 8 is not 
irreducible, and 

(6.14) x 16 - x = 

x(x — 1)(jc 2 + + l)(x 4 + x 3 + X 2 X + \)(x 4 + ;c 3 + 1)(? + JC + 1). 

This factorization displays the three irreducible polynomials of degree 4 in F 2 [jc], 
Note that the factors of jc 4 — jc appear among the factors of jc 16 — x. This agrees 
with the fact that Fj 6 contains F 4 . 

We will now begin the proof of Theorem (6.4), We will prove the various 
parts in the following order: (d), (c), (a), (b), (e)，and (f). 

Proof of Theorem {6 Ad). Let 欠 be a field of order q. The multiplicative group 
K x has order q — 1 ‘ Therefore the order of any element a E K x divides q — \ : 
a 9 — 1 = 1. This means that a is a root of the polynomial x q ~ { — L The remaining 
element of K, zero, is a root of the polynomial x. So every element of 尺 is a root of 
x(x q ~ l — \) = x q — x. Since this polynomial has q distinct roots in K, it factors 
into linear factors in that field: 

(6.15) x q — x — PI (x — a). 

aSK 

This proves part (d) of the theorem. □ 

Proof of Theorem {6 Ac). By an n-th root of unity in a field F, we mean an ele¬ 
ment a whose nth power is 1. Thus a is an nth root of unity if and only if it is a root 
of the polynomial 

(6.16) x n — 1 ? 

or if and only if its order, as an element of the multiplicative group F x ， divides n. 
The nonzero elements of a finite field with q elements are (q — l)-st roots of unity. 
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This polynomial has at most dk roots in F. But H contains n elements, and 
n = di" ， dk. The only possibility is that n = dk, k = 1, and H is cyclic. □ 

Proof of Theorem (6Aa). We need to prove the existence of a field with q ele¬ 
ments. Since we have already proved part (d) of the theorem, we know that the ele¬ 
ments of a field of order q are roots of the polynomial x q — x. Also, there exists a 
field L containing F；? in which this polynomial (or any given polynomial) factors into 
linear factors (5*3). The natural thing to try is to take such a field L and hope for the 
best — that the roots of x q — x form the subfield AT of L we are looking for* This is 
shown by the following proposition: 


In the field of complex numbers, the nth roots of unity form a cyclic group of 
order n, generated by 

(6.17) Cn = e 2iri/n : 



A field need not have many roots of unity* For example, the only real ones are ±1. 
But one property of the complex numbers carries over to arbitrary fields: The nth 
roots of unity in any field form a cyclic group. For example, in the field = F 4 of 
order 4, the group K x is a cyclic group of order 3, generated by a. [See (6.3),] 

(6.18) Proposition. Let F be a field, and let // be a finite subgroup of the multi¬ 
plicative group of order n. Then // is a cyclic group, and it consists of all the nth 
roots of unity in F. 

Proof. If H has order n, then the order of an element a of H divides n，so a is 
an nth root of unity, a root of the polynomial x n — 1 . This polynomial has at most n 
roots，so there aren’t any other roots in F [Chapter 11 (1,18)]. It follows that H is 
the set of all nth roots of unity in F. 

It is harder to show that H is cyclic. To do so, we use the Structure Theorem 
for abelian groups ， which tells us that H is isomorphic to a direct product of cyclic 
groups: 

H « X ••- X 1/{dk), 

where d\\d 2 -*\dk and n = 忒 … 办 .The order of any element of this product divides 
dk because dk is a common multiple of all the integers di . So every element of // is a 
root of 
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(6.19) Proposition. Let p be a prime，and let q = p r . 

(a) The polynomial x q — x has no multiple root in any field L of characteristic p. 

(b) Let L be a field of characteristic p, and let K be the set of roots ofx q — x in L. 

Then ^ is a subfield. 

This proposition, combined with Proposition (5.3), proves the existence of a field 
with q elements. 

Proof of Proposition (6.19). (a) The derivative of x q — x is qx q ~ l — K In 
characteristic p, the coefficient q is equal to 0, so the derivative is equal to -L Since 
the constant polynomial -1 has no root ， x q — x and its derivative have no common 
root! Proposition (5.7) shows thatx 9 — x has no multiple root. 

(b) Let a,p E L be roots of the polynomial x q — jc. We have to show that a ± 
ap ，and oT 1 (if a 关 0) are roots of the same polynomial. This is clear for the 
product and quotient: If a q = a and /3 q = then (ap) q = a/3 and (a— 1 ) 9 = aK It 
is not obvious for the sum, and to prove it we use the following proposition: 

(6.20) Proposition. Let L be a field of characteristic p, and let q = p r . Then in 
the polynomial ring L\x,y\ we have (jc + y) q = x q + y q . 

Proof, We first prove the proposition for the case q = p. We expand (jc + yY 
in Z[x,y], obtaining 

(x + 3 ；)^ = + (\)x p ~^y + (?);c p_ 2 ;y 2 + … + + y p , 

by the Binomial Theorem. The binomial coefficient ( p r ) is an integer, and if 
0 < r < p, it is divisible by p [see the proof of (4.6) in Chapter 11]. It follows that 

the map I\x,y] - > L[x,y] sends these coefficients to zero and that (x + yY — 

x p + y p inL[x 9 y]. 

We now treat the general case q - p r by induction on r: Suppose that the 
proposition has been proved for integers less than r and that r > 1. Let q f = p r ~K 
Then by induction, (jc + y) q = ((jc + y) qt Y = (x qf + y qt Y = (x qr ) p + {y qf Y = 
x q + y q . □ 

To complete the proof of Proposition (6.19)，we evaluate u at a，/3 to con¬ 
clude that (a + /3) 分 =a 9 + Then if a q = a and P q = /3, we find 
(a + fi) q = a + 芦 ， as required. The case of a — /3 follows by substituting -/3 for 
) 8 . □ 

Proof of Theorem (6Ab), Let K and K f be fields of order q, and let a be a 
generator of the cyclic group K x • Then K is certainly generated as a field extension 
of F = ¥ p by the element a: K = F(a). Let/(jc) be the irreducible polynomial for a 
over so that K — F[x]/(f) (2.6). Then a is a root of two polynomials: f(x) and 
x q — x. Since / is irreducible, it divides x q — x (5.4e). We now go over to the sec¬ 
ond field K f • Since x q - x factors into linear factors in ^ r ,/has a root a f in K r , 
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Then K — F[jc]/(/) — F(a f ). Since K and K f have the same order ， F(a f ) = K f \ 
hence K and K f are isomorphic •口 

Proof of Theorem (6.4e), Let/( jc) be an irreducible polynomial of degree r in 
F[x], where F = f p as before. It has a root a in some field extension L of F 5 and 
the subfield K = F(a) of L has degree r over F (3,2). Therefore K has order 
q = p r , and by part (d) of the theorem, a is also a root of x q — x. Since / is irre¬ 
ducible, it divides x q — x, as required. 

In order to prove the same thing for irreducible polynomials whose degree k 
divides r, it suffices to prove the following lemma: 

(6.21) Lemma, Let k be an integer dividing r, say r = ks ，and let q = p r ， 
q r = p k . Then x qf — x divides x q — x t 

For if/is irreducible of degree k 9 then，as above, / divides x qf — x, which in turn 
divides x q — x in F[x], for any field F, 

Proof of the lemma. This is tricky, because we will use the identity 

(6.22) y d ~ ^ = {y ~ l)(〆 -1 + … + ：y + 1) 

twice. Substituting y = q f and d = s shows that q r — l divides ^ — 1 = q fs — L 

Knowing this, we can conclude that x q ~ l — 1 divides x q ~ { — 1 by substituting 

y = x q ~ l and d = {q — 1)/( 分 ' 一 1). Therefore x q — x divides x q — x too. □ 

So we have shown that every irreducible polynomial whose degree divides r is 
a factor of x q — x. On the other hand, if/is irreducible and if its degree k doesn’t 
divide r，then since [K : F] = r,/doesn’t have a root in K, and therefore/doesn’t 
divide x q — x. n 

Proof of Theorem (6Af), If/: does not divide r，then q = p r is not a power of 
q f = p k , so a field of order q can not be an extension of a field of order 分 On the 
other hand, if k does divide r, then Lemma (6.21) and part (d) of the theorem show 
that the polynomial x qt — x has all its roots in a field K of order q. Now Proposition 
(6.19) shows that K contains a field with q f elements. □ 

This completes the proof of theorem 6.4, 


Z FUNCTION FIELDS 


In this section we take a look at function fields ， the third class of field extensions 
mentioned in Section \. The field C(x) of rational functions in one variable x will be 
denoted by F throughout the section. Its elements are fractions g(x) = p [x)/q(x) of 
polynomials p，q E ： C[jc] ，with 分关 0. We usually cancel common factors in p and q 
so that they have no root in common. 

Let us use the symbol P to denote the complex plane，with the complex coordi¬ 
nate x. A rational function g = p/q determines a complex-valued function of x, 
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which is defined for all x E P such that g(;c) 共 0, that is, except at the roots of the 
polynomial q. Near a root of q, the function defined by g tends to infinity. These 
roots are called poles of g • (We usually use the phrase “rational function” to mean an 
element of the field of fractions of the polynomial ring. It is unfortunate that the 
word function is already there* This prevents us from modifying the phrase in a nat¬ 
ural way when referring to the actual function defined by such a fraction- The termi¬ 
nology is ambiguous, but this can’t be helped-) 

A minor complication arises because formal rational functions do not define 
functions at certain points，namely at their poles* When working with the whole 
field F, we have to face the fact that every value a of x can be a pole of a rational 
function, for example of the function (x — a)~K There is no way to choose a com¬ 
mon domain of definition for all rational functions at once. Fortunately this is not a 
serious problem, and there are two ways to get around it. One is to introduce an ex¬ 
tra value w and to define g(a) = w if a is a pole of g. This is actually the better way 
for many purposes, but for us another way will be easier，It is simply to ignore bad 
behavior at a finite set of points. 

Any particular computations we may make will involve finitely many func¬ 
tions, so they will be valid except at a finite set of points of the plane P, the poles of 
these functions. A rational function is determined by its value at any infinite set of 
points. This is proved below, in Lemma (7,2). So we can throw finite sets out of the 
domain of definition as needed, without losing control of the function* Since a ratio¬ 
nal function is continuous wherever it is defined, we can recover its value at a point 
x 0 which was thrown out unnecessarily, as 

(7.1) g(x 0 ) = lim g(x). 

x ~^ x o 

(7.2) Lemma. If two rational functions/i, fz agree at infinitely many points of the 
plane，then they are equal elements of F. 

Proof. Say that/ = pi/qt, where pi, qi E C[t]. Let h(x) = piqi — Piq\^ If 
h(x) is the zero polynomial，then f x = / 2 . If h(x) is not zero，then it has finitely 
many roots, so there are only finitely many points at which f x = / 2 . □ 

In order to formalize the intuitive procedure of ignoring trouble at finite sets of 
points, it is convenient to have a notation for the result of throwing out a finite set. 
Given an infinite set f/, we will denote by U f a set obtained from U by deleting an 
unspecified finite subset, which is allowed to vary as needed: 

(7.3) U f = U — (variable finite set)* 

By a function on U f mean an equivalence class of complex-valued func¬ 
tions, each defined except on a finite subset of U. Two such functions /, g are called 
equal on U f if there is a finite subset A of U such that/and g are defined and equal 
on t/ — A. (We could also refer to this property by saying that/ = g almost every¬ 
where on U. However，in other contexts, “almost everywhere” often means “except 



Section 7 Function Fields 


517 


on a set of measure zero,” rather than “except on a finite set.’’) A function / on U f 
will be called continuous if it is represented by a continuous function on some set 

_ A. 

The set of continuous functions on U f will be denoted by 

(7.4) &^(U) = {continuous functions on U f }. 

This set forms a ring，with the usual laws of addition and multiplication of functions: 

(7.5) [/ + g](x) = f(x) + g(x) and [fg](x) = f(x)g(x). 

Lemma (7.2) has the following corollary: 

(7.6) Proposition. The field F = C(x) is isomorphic to a subring of the ring 
泽 ( 尸 )， where P is the complex plane. □ 

Let us now examine one of the simplest function fields in more detail. We are 
going to need polynomials with coefficients in the field F. Since the symbol x has al¬ 
ready been assigned, we use y to denote the new variable. We will study the 
quadratic field extension K obtained from F by adjoining a root of f(y), where 
f — y 2 — x. Since/depends on the variable x as well as on y，we will also write 

(7-7) / = f(x,y) =y 2 -x. 

The polynomial y 2 - x is an irreducible element of F[y], so K can be constructed as 
the abstract field F[y]/(f)^ The residue of the variable y is a root of / in K. 

The importance of function fields comes from the fact that their elements can 
be interpreted as actual functions. In our case, we can define a square root function 
h, by choosing one of the two values of the square root for each complex number 

x : h(x) = Vx. Then h can be interpreted as a function on 尸 ' However, since 
there are two values of the square root whenever jc 关0 , we need to make a lot of 
choices to define this function. This isn’t very satisfactory* If x is real and positive, 
it is natural to choose the positive square root，but no choice will give a continuous 
function on the whole complex plane. 

The locus S of solutions of the equation y 2 — jc = 0 in C 2 is called the 
Riemann surface of the polynomial y 2 — x (see Section 8 of Chapter 10). It is de¬ 
picted below in Figure (7.9), but in order to obtain a surface in real 3-space, we 
have dropped one coordinate. The complex two-dimensional space C 2 is identified 
with U 4 by the usual rule (x 9 y) = (jco + x\i,yo + y\i) < — ^(jco ， jci ， yo ， yi). The 
figure depicts the locus 

(7_8) {(xo,x u y 0 ) I jo = real part of {xo+Xii) l/2 }. 

This is a projection of S from R 4 to IR 3 . 
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(7,9) Figure. The Riemann surface y 2 = x. 


The Riemann surface S does not cut itself along the negative jto-axis as the projected 
surface does. Every negative real number x has two purely imaginary square roots, 
but the real parts of these square roots are zero. This produces the apparent self¬ 
crossing in the projected surface. Actually, 5 is a two-sheeted branched covering of 
P, as defined in Chapter 10 (8.13), and the only branch point is at x = 0. 

Figure (7.9) shows the problem encountered when we try to define the square 
root as a single-valued function. When x is real and positive, the positive square root 
is the natural choice. We would like to extend this choice continuously over the 
complex plane, but we run into trouble: Winding once around the origin in complex 
jc-space brings us back to the negative square root. It is better to accept the fact that 
the square root, as a solution of the equation y 2 — jc = 0, is a multi-valued function 
on 尸 ' • 

Now there is an amazing trick which will allow us to solve any polynomial 
equation f(x,y) = 0 with a single-valued function, without making arbitrary 
choices. The trick is to replace the complex plane P by the Riemann surface S, the 
locus/(x,y) = 0. We are given two functions on S, namely the restrictions of the 
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coordinate functions on C 2 . In order to keep things straight, let us introduce new 
symbols for these functions，say X ， Y: 

(7.10) X(jc ， y) = jc and Y(jc ， y) = y，for (x,y) E S. 

These restrictions of the coordinate functions to S are related by the equation 
/(X ， Y) = 0, because by definition of S,f(x,y) = 0 at any point of S. 

(7-11) Proposition. Let/(jc ， y) be an irreducible polynomial in C[jc ， y] which is 
not a polynomial in x alone，and let S = {(x ， y) | /(jc, y) = 0} be its Riemann sur¬ 
face. Let K = F[y]/(f) be the field extension defined by/. Then K is isomorphic to 
a subring of the ring 9^(5) of continuous functions on S r . 

Proof. Let g{x) be a rational function. Since X is the restriction of a coordi¬ 
nate function on C 2 , the composed function g(x) is continuous on S except at the 
points which lie above the poles of g. There are finitely many such points [Chapter 
10 (8J1)]. So ^(X) is a continuous function on S\ We define a homomorphism 

F - ^ ^(S) by sending ^(jc) to g(X). Next, the Substitution Principle extends this 

map to a homomorphism 

(7.12) <p: F[y] —^⑸， 

by sending y. Since /(X ， Y) = 0, the polynomial/(;c ， y) is in the kernel of 
Since K = F[y]/(f), the mapping property of quotients [Chapter 10 (4.2)] gives us 

a map Ip: K - >^(S) which sends the residue of y to Y. Since 尺 is a field, Ip is 

injective. □ 

(7.13) Definition. An isomorphism of branched coverings Si, Si of the plane 尸 is a 

homeomorphism <p f : S] f - ^ S 2 f which is compatible with the maps Si - >P, 

that is，such that 7 t 2 V = 



P 


By this we mean that <p ; is defined except on a finite set of Si and that when suitable 
finite sets are omitted from S } and S 2 , <p r is a homeomorphism. 

A branched covering S is called connected if the complement S f of an arbitrary 
finite set of 5 is a path-connected set. 

We will now state a beautiful theorem which describes the finite extensions of 
the field of rational functions. Let % n denote the set of isomorphism classes of exten¬ 
sion fields A" of of degree n. Let % n denote the set of isomorphism classes of con¬ 
nected n-sheeted branched coverings 7r: S - > P of the plane. 


(7.14) Theorem• Riemann Existence Theorem: There is a bijective map 
% n - If K is the extension obtained by adjoining a root of an irreducible 
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polynomial/^, y) 6 C[x,y], then the class of branched coverings corresponding to 
K is represented by the Riemann surface of /. □ 

The proof of this theorem is a suitable topic for a course in complex variables, 
It requires too much analysis to give here. Using it ， however, we can associate a 
branched covering of the plane, unique up to isomorphism, to every finite extension 
field K of F. This covering is called the Riemann surface of the extension field K. 
The Riemann surface of F is the complex plane P itself. 

Here are two striking corollaries of the theorem: 

(7.15) Corollary* Given a connected n-sheeted branched covering S of the plane, 
there is a polynomial/(x,y) of degree n in y whose Riemann surface is isomorphic 
to S. 

This follows from the surjectivity of the map <J> n and from a fact which will be 
proved in the next chapter [Chapter 14 (4.1)]，that every finite extension K of F can 
be obtained by adjoining a single element. □ 

(7.16) Corollary- Let f, g be irreducible polynomials in C[x, j], with Riemann 
surfaces S, T. Let a be a root of/(y) in an extension field of F. If S and T are isomor¬ 
phic branched coverings，then g(y) has a root in F(a). 

This follows from the injectivity of the map <P n •口 

Visualization of Riemann surfaces is complicated by the fact that they are em¬ 
bedded in C 2 , a four-dimensional real space. One aid to constructing and visualizing 
them is a method known as cut and paste. If we cut the surface y 2 — jc open along 
the negative real axis，the double locus in Figure (7.9)，then it decomposes into the 
two parts re Y > 0 and re Y <0. Each of these parts projects to the x-plane P in a 
bijective way，if we disregard what happens along the cut. Turning this procedure 
around, we can construct a surface which is homeomorphic to S in the following 
way; We stack two copies Pi, P 2 of the complex plane over P and cut them open 
along the negative real axis (—» ， 0], These copies of P are called sheets. Then we 
glue side A of Pi to side B of P 2 and vice versa (see below). Four dimensions are 
needed to embed S without crossings. 

side A of cut 

• 

side B of cut 

(7.17) Figure. 

To construct a general branched covering S of the plane by the cut-and-paste 
procedure, we begin with n copies of the plane P, called sheets. The sheets are la¬ 
belled P \ ，…， P n and are stacked up over P. We also select a finite set of points 
a 】 ，…， a r of 尸 to be branch points. For each branch point a v , we choose a curve C v 
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beginning at a v and going to infinity in an arbitrary direction. This should be done 
in such a way that the curves C v do not intersect. The sheets Pi are cut open along 
these curves. Then various sheets are glued to others along opposite edges of the 
cuts. 

To describe the resulting covering S, we need only describe the permutations 
(T v by which the sheets are glued together along the cuts. To be specific, we draw a 
small loop i v around the point a v in the counterclockwise direction. Then if the per¬ 
mutation (r v sends the index 1 to 3, we glue sheet P { to sheet as we cross C v . This 
means that if we start on sheet P\ and wind once around the loop we return on 
sheet P 3 . The permutation can be arbitrary. 


C2 



The points a v are called branch points of the surface S because the surface de¬ 
composes into n disjoint sheets near any other point of P. It won’t have n disjoint 
sheets above the point a v unless the permutation ct v is the identity. If ov = 1， then 
each sheet is glued back to itself along the cut C VJ so that cut was not needed. But it 
is convenient to allow this as a possibility. Let’s call a v a true branch point if 
ov 关 1. Some of the points a v may not be true branch points. However, all true 
branch points are among them. 

It is important to note that the numbering of the sheets is arbitrary and, in par¬ 
ticular, that the concept of a “top sheet” has no intrinsic meaning for the Riemann 
surface of a polynomial. If there was a top sheet, we could define y as a single¬ 
valued function by choosing the value on that sheet. One can do this only once the 
Riemann surface has been cut open. This is the whole point; wandering around on 
the surface will lead us from one sheet to another. 

It is not difficult to decide when two such branched coverings are isomorphic. 

(7.18) Proposition. Let S,T be branched coverings which are constructed as 
above, with the same branch points a v and the same curves C v , but using different 
sets of permutations (c^ ， • •. ， cr r ) and (Ti ， • • • ， T r ). Then S and T are isomorphic cover¬ 
ings if and only if the two sets of permutations are conjugate, that is, if and only if 
there is a permutation p such that t v = p~ l cr p p for all v. 
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Proof. Let a,C stand for a v ,C v . Our rule is that Pi is glued to Pi a along C, 
Suppose that we relabel the sheets P u … ， P n ，changing the numbers by a permuta¬ 
tion p. To keep old and new labellings straight, let’s label the renumbered sheets as 
Qj. So for every i, Pi is relabelled as Qi P . The rule now tells us to glue Pi = Qi P to 
Pier = Qia P . Substituting i = jp— 1 shows that the rule glues Qj to g/p-Vp. Thus the 
permutation which describes this gluing rule is the conjugate p'V^p of the old per¬ 
mutation (t v . Since the covering is not changed by the relabelling process，this 
shows that a conjugate set of permutations defines an isomorphic covering. 

Conversely, let <p: 5 - >T be an isomorphism of coverings. Let Pi,..., P n be 

the sheets which are used to construct S, and let be those used to construct 

T. Then since Pi is connected and since T, when cut open, is a disjoint union of the 
open sets Qj, the image of Pi must be contained in a single sheet Qj, Since <p is com¬ 
patible with the projections to 尸 ， which are homeomorphisms except on the cuts, the 
restriction of <p to Pi must be a bijection onto the sheet Qj. So we can renumber the 
sheets Qj so that Pi is mapped to Qi ‘ This changes the permutations t v to conjugates, 
as above. So we may assume that <p carries Pi to Qi. Also, <p is continuous across the 
cuts. Therefore if crossing the cut C v on sheet Pi leads to Pj, then, similarly, cross¬ 
ing on Qi must lead to Qj. Therefore ov = tv. □ 

We can also start with an arbitrary branched covering S and reconstruct it in 
this way; Say that S is branched at the points E P. As above，we choose 

nonintersecting curves C/ beginning at a/ and going to infinity. Then if S is cut open 
above the curves G，it decomposes into n sheets. This is a theorem of topology ， be¬ 
cause the complement of the curves C/ in P is simply connected [Munkres ， Topology 
p- 342, exc. 8] + Therefore a covering homeomorphic to S can be reconstructed from 
n sheets Pi ， • • • ， by cutting them open along the curves and gluing together to mix 
up the sheets. 

We will now describe the Riemann surfaces of a few simple polynomials /_ 
This is usually difficult to do when / is complicated. 

(7.19) Example. The Riemann surface of y 3 — x: Here y represents a cube root of 
x y and 5 is a three-sheeted covering of P. The only branch point is jc = 0. We cut S 
open above the positive real axis C = [0 ? go]. This decomposes S into three sheets 
Pi ， Pi ， P 3 ， and it is reasonable to guess that the gluing along the cut is done by a 
cyclic permutation. 

This case is fairly easy to analyze because jc is a single-valued function of y. 
Because of this，we can interpret S as the graph of a function from j-space to 
x-space, which implies that the projection of S onto the complex y-plane is bijective. 
We identify S with the y-plane using this projection and cut it open above C. This 
will decompose the plane into three parts corresponding to the sheets Pi . The rules 
for gluing will be evident when this decomposition is made explicit. 

The values of y lying over the cut C are those for which y 3 = jc is real and pos¬ 
itive. They are y = re lQ ， where 6 = 0, 2 订 /3, or 47 t/3. So the sheets are sectors. 
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x ~ pl ane y » plane 

In the figure，the sectors have been numbered arbitrarily. Note that under the map 

= x, each of the three sectors is stretched radially and maps bijectively to 
the entire plane, disregarding the cuts. As we move along S to cross the cut in the 
x- plane, we also cross one of the three cuts in the y-plane. As predicted，this per¬ 
mutes the sheets by the cyclic permutation (123). □ 


(7_20) Example. 

The Riemann surface off(x,y) = y 3 — 3y — x: The points x at which this polyno¬ 
mial has fewer than three roots are found by solving the equations/ = df/dy = 0 
[see Chapter 10 (8.12)]. Here df/dy = 3(y 2 — 1). So the solutions are y = ±1 ， 
and hence x = ±2. We may cut S open above the curves C\ = ( 一①， 一 2] and 
Ci = [2, ⑺)， to decompose it into three sheets. 

Again, x is a single-valued function of y, and we can analyze the gluing of the 
sheets by cutting the y-plane apart suitably. To do so, we ask for the values of y such 
that x lies on one of the curves Ci. Since these curves are on the real jc-axis，we be¬ 
gin by solving the equation imx = 0. Setting y = u + vi ，we find imx = 
im(y 3 — 3y) = v(3u 2 — v 2 — 3). The solutions are the w-axis t; = 0 and the two 
branches of the hyperbola 3u 2 - v 2 = 3. The points on the w-axis in the interval 
(-2,2) correspond to jc E (-2, 2)，so they do not lie over the cuts. 





尸 i 


■> 


x - plane 


y - plane 
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Again，each of the three regions into which the y-plane decomposes is mapped 
bijectively to the x-plane by the function y 3 — 3y ? disregarding the cut as always. In 
the figure, the dotted curves are those which lie over Ci . The figure shows that mov¬ 
ing on S to cross the curve (一 00 , -2] interchanges the sheets Pi ， 尸 2 , leaving P 3 alone, 
and similarly that crossing above [2, ») interchanges 尸 2 , 尸 3 , So the branching is de¬ 
scribed by the transposition (23) at the branch point x = -2 and by (12) at jc = 2. □ 

(7.21) Example. The Riemann surface of y 2 — jc 3 + x 2 : There are two points 
jc = 0, 1 above which S has fewer than two points. However, at jc = 0 the sheets 
cross without getting mixed up, so the only true branch point is jc = 1. To see this 
we make the change of variable x = x, z = y/x, which is defined and invertible ex¬ 
cept at jc = 0. Then z 2 — x + 1 = 0, The given surface S becomes homeomorphic 
to the Riemann surface of z 2 — x + 1 when the points above the origin are deleted ， 
and the surface can be reduced to (7.9) by a translation in the jc-plane. □ 


When it is not possible to solve for x as a single-valued function of y ， the prob¬ 
lem of describing the gluing data becomes more difficult. We will work out one ex¬ 
ample of this type, 

(7.22) Example* The Riemann surface of y 2 - (jc 3 - x): There are three points at 
which jc 3 一 jc = 0 ， namely x = 0 , ± 1 ， and the surface has three branch points at 
which it behaves like the Riemann surface of y 2 — x at the origin. Our systematic 
procedure is to make cuts from these three branch points to infinity, but in this case 
another choice of cuts is easier to analyze. The values of x such that y is purely imag¬ 
inary are the real x such that x 3 — x < 0 . These are the points in the two intervals 
(— 00 , — 1 ] and [0,1]. If we cut S open along these two intervals，it will decompose 
into the parts re y > 0 and re y <0, Thus we can reconstruct the surface S by 
stacking up two copies of 尸 ， cutting them open along the intervals and gluing to mix 
up the sheets as before. 


6 1 

(7-23) Figure. 

The fact that a surface constructed by the cut-and-paste method crosses itself 
along the cuts makes it confusing to visualize directly. But since the cuts are along 
the real axis in this example, we can avoid crossings by turning one of the sheets 
over. This ruins the representation of 5 as a double covering of 尸 ， but the advantage 
is that the sheets are now glued along the same side of the cut. There are two such 
cuts in Figure (7.23). Turning one sheet over and stretching to pull the slits apart af¬ 
ter gluing results in the following picture: This Riemann surface is homeomorphic 
to a torus with one point deleted, □ 
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8. TRANSCENDENTAL EXTEJSSIONS 

In this section we will take a brief look at some transcendental field extensions. We 
saw in Propositon (2,5) that the structure of the field extension F(a) generated by a 
single transcendental element a over a field F does not depend on the element a. But 
if two transcendental elements a，/3 are adjoined at the same time, the structure of 
the field F(a,/3) which is obtained will depend on whether or not the elements a and 
P are algebraically related, and if they are related, the structure will depend on the 
nature of this relation. For example, a = and )3 = ^/rr 1 are transcen¬ 

dental numbers over Q, which are related by the equation 

/3 2 — a 3 + a = 0. 

In general, we call a set of elements {a! ， … ， a„} of an extension field K D F 
algebraically dependent over F if there is a nonzero polynomial in n variables 
/(jci ， x„) E 尸 [Jti ， • ， • ， x„] such that 

f(au..^a n ) = 0 , 

and we call them algebraically independent over F if there is no such polynomial. 
Thus \/tt and Vtt 1 are algebraically dependent over Q. It is conjectured 

that e and 77 are algebraically independent, but this has not been proved. 

We can interpret algebraic independence in terms of the substitution homo¬ 
morphism <p: F[x\ Jt .^Xn] — —>K sending f(x x The ele¬ 

ments ai ， … ， a n are algebraically independent if ker = 0, that is，if is injective ， 
and algebraically dependent otherwise. Passing to fields of fractions gives this 
proposition: 

(8.1) Proposition, If 叫 ，…， are algebraically independent, then F(a\ ， … ， a n ) is 
isomorphic to the field Fixi^.^Xn) of rational functions in jci, 3 jc n? the field of 
fractions of F [jci ， • ， . □ 

An extension of the form F(a] ， … ， a„)，where a,- are algebraically independent, is 
called a pure transcendental extension. 

(8.2) Definition. A transcendence basis for a field extension of F is a set of ele¬ 
ments (ai， …， which are algebraically independent and such that K is an alge¬ 
braic extension of the field F(a\^..,a n ). 
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(8.3) Theorem^ Let ( 如， … ， a m ) and (/3i， …， 〜）be elements in an extension K of 

a field F. Assume that K is algebraic over F(j3i ， …， /3 n ) and that are alge¬ 

braically independent over F. Then m < n, and (a u ^.,a m ) can be completed to a 
transcendence basis for K by adding (n — m) of the elements )8/. 

We leave the proof of this theorem as an exercise. □ 

(8.4) Corollary. Any two transcendence bases for an extension F C K have the 
same number of elements. □ 

(8.5) Definition. The transcendence degree of K is the number of elements in a 
transcendence basis，or is infinite if no finite transcendence basis exists. 

(8.6) Examples 

(a) The fields F(x\ ，•. • ， of rational functions in n variables are not isomorphic ex¬ 
tensions of F for different values of because (x\ is a transcendence basis. 

(b) Let a, /3 be as at the beginning of the section. The single element tt forms a 
transcendence basis for K — 0(a ? j8) over O. Therefore (8.3) implies that, as was 
asserted above, any two elements of K are algebraically dependent. The element /3 
is another transcendence basis. 

(c) Consider any two polynomials or rational functions in one variable /， 发 G F(x). 
There is a nonzero polynomial <p{y,z) E F[y, z] such that <p(f, g) = 0. For，the 
transcendence degree of F(x) is 1 ? and hence f, g are algebraically dependent. 

Most field extensions aren’t pure transcendental, though this may be difficult to 
decide for a particular extension. Here are two examples: 

(8.7) Proposition. 

(a) The function field L = C(x)[y]/(y 2 — x 3 ) is a pure transcendental extension of 
C. It is the field of rational functions in t = y/x. 

(b) The function field K = ^(x)[y]/(y 2 — x 3 + x) is not a pure transcendental ex¬ 
tension of C- That is，there is no element t G K such that K = C(t). 

Proof. In both cases, the transcendence degree of K over C is 1 ， because x is a 
transcendence basis. 

(a) Let t = y/x. Then C(t) C L because t S L. Now L is generated by x and y, by 
definition. On the other hand, x = t 2 and y = t 3 . Therefore L = C(t). Since K has 
transcendence degree 1 ， (8.4) shows that t is transcendental. 

(b) (Sketch) To show that K is not a field of rational functions, we appeal to the ge¬ 
ometry of its Riemann surface. We saw in the last section that this surface is a torus 
from which one point has been deleted. On the other hand, the Riemann surface of 
the field of rational functions C ⑺ is the complex plane itself. Now，it is a theorem 
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of topology that the torus and the plane are not homeomorphic and that they do not 
become homeomorphic when finite sets are deleted. If we admit this theorem, then 
the next proposition will complete the proof. 

(8.8) Proposition. Let K = C(x)[y]/(f) and L = C(t)[u]/{g) be function fields 

with Riemann surfaces S, T respectively, A homomorphism <p\ L - >K which is the 

identity on the subfield C induces a map <p^:S f - > T between their Riemann sur¬ 

faces, which is defined and continuous except on a finite set of points of S. If <p is an 
isomorphism，then becomes a homeomorphism when suitable finite sets are 
deleted from S and T. 

Note that the map goes from the Riemann surface of K to that of L, in the oppo¬ 
site direction from <p. 

Proof. The Riemann surface T is the locus g(t ,«) = 0 in C 2 , According to 
Proposition (7.11)，every element a S K defines a continuous function on 5'，so 

the pair of functions (<p(t),<p(u)) defines a continuous map S f - >C 2 . Since 

g(t, u) — 0 in L and since <p is a homomorphism which leaves the coefficients of g 
fixed ， g{<p{t),<p{u)) = 0 too. So S r is mapped to T r This is the required map If 
<p is an isomorphism, its inverse defines a map T r - > S which is an inverse func¬ 

tion to <p^ on the complement of a finite set. □ 


9. ALGEBRAICALLY CLOSED FIELDS 

A field F is said to be algebraically closed if every polynomial f{x)SF [x] of posi¬ 
tive degree has a root in F. The fact that the field C of complex numbers is alge¬ 
braically closed is called the Fundamental Theorem of Algebra. 

(9.1) Theorem* Fundamental Theorem of Algebra: Every nonconstant polynomial 
with complex coefficients has a complex root* 

We have used this theorem often already. A proof is at the end of the section. 

If a field F is algebraically closed, then every nonconstant polynomial 
f(x) G F[x] has a linear factor x — a, so the only irreducible polynomials are the 
linear ones. Consequently every polynomial is a product of linear factors. Also, 
there are no algebraic extensions of F other than F itself (whence the phrase alge¬ 
braically closed). For if a is algebraic over F, then it is a root of a monic irreducible 
polynomial/^) G F[x]. This polynomial must have the form x — a, so a E F. 

It may be convenient to think of a field F which is being studied as a subfield 
of an algebraically closed field. For instance，we like to think of number fields as 
subfields of C, Let us call an extension field of F an algebraic closure of F if 

(9.2) (i ) 欠 is algebraic over F, and 
(ii) K is algebraically closed. 
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(9.3) Corollary. Let F be a subfield of C. The subset F of C consisting of all 
numbers which are algebraic over F is an algebraic closure of F. 

Proof ， The fact that F is a field has been proved (3.10). To show that F is al¬ 
gebraically closed, \etf(x) E F[x] be a nonconstant polynomial. Then f(x) has a 
root a in C, and F(a) is algebraic over F. Since F is algebraic over F, a is algebraic 
over F by (3.11). So a E F. □ 

It is not hard to construct an algebraic closure of a finite field , as a union of 
the fields ¥ q , where q = p r is a power of p. To do this，we choose a sequence of 
integers r u r 2 , … with these properties: (i) n divides n~n ，and (ii) every integer n di¬ 
vides some n. We may take n = /!, for example. We set qi = p ri and Fi = ¥ qi . It 
follows from (i) that Fi +X contains a subfield isomorphic to Fi (6.4), so we can build 
a tower of fields Fi C F 2 C ***, Let F be the union of this chain of fields. Then (ii) 
tells us that every finite field ¥ q , q = p r ^ is isomorphic to a subfield of some Fi, 
hence to a subfield of F. This field is an algebraic closure of 

The following theorem can be proved using Zorn's Lemma, 

(9.4) Theorem* Every field F has an algebraic closure，and if K\ , K 2 are two alge¬ 
braic closures of F, there is an isomorphism <p\ K\ - >K 2 which is the identity map 

on the subfield F, □ 

Thus the algebraic closure is essentially unique. 

(9.5) Corollary • Let F be an algebraic closure of F, and let K be any algebraic ex¬ 
tension of F. There is a subextension K f C F isomorphic to K. □ 

Proof of the Fundamental Theorem of Algebra. To show that/(x 0 ) = 0， it is 
enough to show that the absolute value | f(xo) \ is zero. The existence of such a value 
^0 E C is proved by the following two lemmas: 

(9.6) Lemma* Let/(x) be a nonconstant polynomial，and let x 0 E C be a point at 
which/(^ 0 ) ^ 0. Then | f(x 0 ) \ is not the minimum value of | f(x) \. 

(9*7) Lemma. Let/(x) be a complex polynomial. Then | f(x) \ takes on a mini¬ 
mum value at some point x 0 E C. 

Proof of Lemma (9.6). We first note that the polynomial x k — c has a root for 
all c E C. A nonnegative real number r has a real kth root because the continuous 
function x k , which is zero when x = 0 and large when jc is a large real number, 
takes on all real values > 0, by the Intermediate Value Theorem. We write the com¬ 
plex number c in the form c = re 16 , where r = \c\ and 8 = arg c. Let 5 be a real 
kth root of r. Then the required fcth root of c is 

(9.8) a = se 刪 K 

Now \ctf(x) be a nonconstant polynomial，and let x 0 E C be a point at which 
f(xo) ^ 0. It is convenient to normalize f. We make a change of variable，replacing 
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又 by x + x 0 , to shift the point in question to the origin: x 0 = 0. We also multiply 
f(x) by Then/(0) = 1， and we must show that 1 is not the minimum value of 

I/WI, 

Let k denote the lowest nonzero power of x occurring in/, so that 

f(x) = 1 + ax k + (terms of degree > k). 

Let a be a kth root of -a~ { . We make a final change of variable, replacing x by ax. 
Then/takes the form 

f(x) = 1 — x k + (higher-degree terms) = 1 — x k + x k ^ l g(x) y 

for some polynomial g{x). For small positive real x 9 the Triangle Inequality shows 
that 

I f(x) I < 11 - + |x*Kx) I = 1 - X* + X k+l I g(I) I = 1 - - x| ^(x) I). 

Since x\ g (x) | is small for small x, the term x k (l — x\ g(x) |) is positive when x is a 
sufficiently small positive real number. For such x, | f(x) | < 1 / ⑼ I •口 

Proof of Lemma (9.7). We may assume that/is not a constant polynomial. For 
large x ? f(x) is also large: 

(9.9) I /(x) I - as |x| - 

To prove this，the constant term of/is irrelevant，so we may suppose that it is zero. 
Then/(x) is divisible by x: f(x) = xg (x) , By induction on the degree，the assertion 
is true for g(x), or else g(x) is constant, and it follows for/(x) as well. 

Now since f(x) is large for large x，the greatest lower bound m of | f(x) | in the 
whole complex plane is also the greatest lower bound in a sufficiently large disc 
x I < r. Since the disc is compact and | f(x) | is a continuous function，it takes on a 
minimum value in the disc. □ 

There are several other proofs of the Fundamental Theorem of Algebra, and 
one of them is particularly appealing, though it is not as easy to make precise as the 
one just given. We will present it in outline. As before, our problem is to prove that 
a nonconstant polynomial 

(9.10) f(z) = z n + a n -iz n ~ l + + a\z + ao 

has a root. If <a 0 = 0, then 0 is a root, so we may assume that ao ^ 0. We consider 

the function/; C - ^ C defined by the polynomial (9.10). 

Let C r denote a circle of radius r about the origin. We study the images f(C r ) 
of the circle CV, To do this, we use polar coordinates，writing z = re ld . Then z n = 
r n e ind . As 6 runs from 0 to 2tt, the point z winds once around the circle of radius r. 
At the same time, nB runs from 0 to 27m，so the point z n winds n times around the 
circle of radius r n . 

For sufficiently large r，the term z n is dominant in the expression (9.10)，and 
we will have 

I f(z) - z n \^\r\ 
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The proof of this fact is similar to the proof of Lemma (9.6). For our purposes, the 
factor \ could be replaced by any positive real number less than 1, This inequality 
shows us that, as z n winds n times around the circle of radius r n ， f(z) also winds n 
times around the origin. A good way to visualize this conclusion is with the dog-on- 
a-leash model. If someone walks a dog n times around the block，the dog also goes 
around n times，though following a different path. This will be true provided that the 
leash is shorter than the radius of the block. Here z n represents the position of the 
person at the time 0， and f(z) represents the position of the dog. The length of the 
leash is jr n . 

We now vary the radius r. Since/is a continuous function, the image/(CV) 
will vary continuously with r . When the radius r is very small, f(C r ) makes a small 
loop around the constant term a 0 of/. This small loop won’t wind around the origin 
at all. But as we just saw 5 /(C r ) winds n times around the origin if r is large enough. 
The only explanation for this is that for some intermediate radius r \ f (C r ，) passes 
through the origin. This means that for some point a on the circle C〆 ， f{a) — 0, 
This number a is a root of /• 

Note that all n loops have to cross the origin, which agrees with the fact that a 
polynomial of degree n has n roots. 


I don’t consider this algebra ， 
but this doesn t mean that algebraists can，t do it • 

Garrett Birkhoff 


EXERCISES 

1. Examples of Fields 

1. Let F be a field. Find all elements a E F such that a - a' 1 . 

2. Let /T be a subfield of C which is not contained in U. Prove that K is a dense subset of C. 

3. Let R be an integral domain containing a field F as subring and which is finite-dimen¬ 
sional when viewed as vector space over F. Prove that /? is a field, 

4* Let F be a field containing exactly eight elements. Prove or disprove: The characteristic 
of F is 2. 

2. Algebraic and Transcendental Elements 

1. Let a be the real cube root of 2. Compute the irreducible polynomial for 1 十 a: 2 over Q. 

2. Prove Lemma (2.7)，that (l,a ， a 2 , … ， a”— 1 ) is a basis of F[a]. 

3. Determine the irreducible polynomial for a = V3 + V5 over each of the following 
fields. 

(a) Q (b) Q(V5) (c) Q(VIo) (d) Q(Vl5) 



Chapter 13 Exercises 


531 


4. Let a be a complex root of the irreducible polynomial x 3 — 3x + 4. Find the inverse of 
a 2 + a + 1 in F(a) explicitly, in the form a + ba + m 2 , a,b 3 c G Q. 

5. Let K = F(a), where a is a root of the irreducible polynomial f(x)= 

x n + a n -\X n ~ l + + a\X + ao> Determine the element oT 1 explicitly in terms of a 

and of the coefficients at . 

6 * Let ^8 = ^^ 2 , where ^ = e 27Tljf3 , and let K = Q(/3). Prove that - 1 can not be written as 
a sum of squares in K. 

3. The Degree of a Field Extension 

1. Let F be a field, and let a be an element which generates a field extension of F of degree 
5, Prove that a 2 generates the same extension. 

2. Let ( = e 27ti ^, and let rj = e 27ri ^ 5 , Prove that 17 C Q(^). 

3. Define Cn = e 27ri ’ n . Find the irreducible polynomial over of (a) 心， （ b) ☆， （ c) 心， 

(d) (e) ^ 0 , (f) ^ 

4* Let l n = e 27Ti ^ n . Determine the irreducible polynomial over Q( 6 ) of (a) ☆ ，（ b) 心， 

(cU 12 , 

5, Prove that an extension K of F of degree 1 is equal to F. 

6 * Let a be a positive rational number which is not a square in Q. Prove that has degree 
4 over Q. 

7. Decide whether or not i is in the field ⑻ Q(V— 2 ), (b) 0 (^—2), (c) Q(a), where 
a 3 + a + 1 — 0. 

8 . Let 尺 be a field generated over F by two elements a ， j3 of relatively prime degrees m, n 

respectively. Prove that = mn. 

9. Let o?，)3 be complex numbers of degree 3 over Q，and let K = Q(a, J3). Determine the 
possibilities for [A": Q]. 

10. Let a, (3 be complex numbers* Prove that if a + ^ and aj3 are algebraic numbers, then 
a and (3 are also algebraic. 

11. Let a,/3 be complex roots of irreducible polynomials/^), g(x) G Q[x] t Let F = Q[a] 
and K = Q[^ 8 ]. Prove that/( jc) is irreducible in K if and only if is irreducible in F. 

12. (a) Let F C F ; C K be field extensions. Prove that if [K:F] = [K:F f ], then F = F’• 
(b) Give an example showing that this need not be the case if F is not contained in F\ 

13. Let be elements of an extension field K of F, and assume that they are all alge¬ 
braic over F. Prove that F(a \ ， 4 ■. ， a*) = a*]. 

14. Prove or disprove: Let a:, )3 be elements which are algebraic over a field F, of degrees 
d, e respectively. The monomials a 1 (5 J with i = 0，，，.，d — l, j = 0，."，e — 1 form a 
basis of F(a, j 3 ) over F. 

15. Prove or disprove: Every algebraic extension is a finite extension. 

4L Constructions with Ruler and Compass 

1. Express cos 15° in terms of square roots* 

2. Prove that the regular pentagon can be constructed by ruler and compass (a) by field 
theory, and (b) by finding an explicit construction. 
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3. Derive formula (4,12), 

4. Determine whether or not the regular 9-gon is constructible by ruler and compass. 

5. Is it possible to construct a square whose area is equal to that of a given triangle? 

6. Let a be a real root of the polynomial x 3 4 + 3x + 1. Prove that a can not be constructed 
by ruler and compass. 

7. Given that tt is a transcendental number, prove the impossibility of squaring the circle 
by ruler and compass. (This means constructing a square whose area is the same as the 
area of a circle of unit radius.) 


8. Prove the impossibility of “duplicating the cube，” that is, of constructing the side length 
of a cube whose volume is 2. 

9. (a) Referring to the proof of Proposition (4.8), prove that the discriminant D is negative 

if and only if the circles do not intersect. 

(b) Determine the line which appears at the end of the proof of Proposition (4.8) geo¬ 
metrically if £) > 0 and also if D < 0. 

10. Prove that if a prime integer p has the form 2 r + 1， then it actually has the form 


11. Let C denote the field of constructible real numbers. Prove that C is the smallest subfield 
of U with the property that if a E C and a > 0, then \Ta E C. 

12. The points in the plane can be considered as complex numbers. Describe the set of con¬ 
structible points explicitly as a subset of C. 

13. Characterize the constructible real numbers in the case that three points are given in the 
plane to start with. 

*14. Let the rule for construction in three-dimensional space be as follows: 

(i) Three non-collinear points are given. They are considered to be constructed. 

(ii) One may construct a plane through three non-collinear constructed points, 

(iii) One may construct a sphere with center at a constructed point and passing through 
another constructed point. 

(iv) Points of intersection of constructed planes and spheres are considered to be con¬ 
structed if they are isolated points，that is, if they are not part of an intersection 
curve. 

Prove that one can introduce coordinates，and characterize the coordinates of the con¬ 
structible points. 


5. Symbolic Adjunction of Roots 


1. Let F be a field of characteristic zero ， let/' denote the derivative of a polynomial 
f G F[x], and let g be an irreducible polynomial which is a common divisor of /and/' 
Prove that g 1 2 divides /■ 

2. For which fields F and which primes p does x p — x have a multiple root? 

3. Let F be a field of characteristic p. 

(a) Apply (5.7) to the polynomial x p + l. 

(b) Factor this polynomial into irreducible factors in F[x]. 

4. Let a\ be the roots of a polynomial/ E F [jc] of degree n in an extension field AT. 

Find the best upper bound that you can for [F(ai ， … ， a^) : F). 
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6. Finite Fields 

1, Identify the group 

2* Write out the addition and multiplication tables for F 4 and for Z/(4)，and compare them. 

3. Find a thirteenth root of 3 in the field F 13 . 

4. Determine the irreducible polynomial over IF 2 for each of the elements ( 6 .12) of Fs. 

5. Determine the number of irreducible polynomials of degree 3 over the field F 3 * 

6 . (a) Verify that (6.9, 6,10, 6.13) are irreducible factorizations over F 2 , 

(b) Verify that ( 6 . 11 , 6*13) are irreducible factorizations over Z, 

7* Factor x 9 — x and x 27 — x in F 3 - Prove that your factorizations are irreducible. 

8. Factor the polynomial jc 16 — jc in the fields (a) F 4 and (b) F 8 ■ 

9. Determine all polynomials/(x) in ¥ q [x] such that /(a) = 0 for all a E 

10. Let K be a finite field. Prove that the product of the nonzero elements of K is -1. 

11 . Prove that every element of F p has exactly one pth root. 

12. Complete the proof of Proposition (6.19) by showing that the difference a — (3 of two 
roots of x g — x is a. root of the same polynomial. 

13. Let p be a prime. Describe the integers n such that there exist a finite field K of order n 
and an element a ^ K x whose order in K 乂 is p. 

14. Work this problem without appealing to Theorem (6.4). 

(a) Let F = f p . Determine the number of monic irreducible polynomials of degree 2 in 

FW. 

(b) Let/(x) be one of the polynomials described in (a). Prove that K = F[x]/{f) is a 
field containing p 2 elements and that the elements of K have the form a + ba, where 
a,b El F and a is a root of/in K. Show that every such element a + ba with b ^ 0 
is the root of an irreducible quadratic polynomial in F[x]. 

(c) Show that every polynomial of degree 2 in F[x] has a root in K. 

(d) Show that all the fields K constructed as above for a given prime p are isomorphic. 

15. The polynomials /(jc) = x 3 + x + 1 ? g(x) = x 3 + x 2 + l are irreducible over F 2 . Let 
K be the field extension obtained by adjoining a root of/，and let L be the extension ob¬ 
tained by adjoining a root of g. Describe explicitly an isomorphism from K to L. 

16* (a) Prove Lemma (6.21) for the case F = C by looking at the roots of the two poly¬ 
nomials. 

(b) Use the principle of permanence of identities to derive the conclusion when F is an 
arbitrary ring. 

Z Function Fields 

1. Determine a real polynomial in three variables whose locus of zeros is the projected 
Riemann surface (7.9). 

2. Prove that the set ^(U) of continuous functions on U f forms a ring. 

3. Let/(x) be a polynomial in F[x], where F is a field. Prove that if there is a rational func¬ 
tion r(x) such that r 2 = f, then r is a polynomial. 

4. Referring to the proof of Proposition (7.11)，explain why the map F - >^(S) defined 

by g(x) AM ^g(x) is a homomorphism. 
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5. Determine the branch points and the gluing data for the Riemann surfaces of the follow¬ 
ing polynomials, 

(a) y 2 — x 2 + l (b) y 5 — x (c) 〆 一 x — 1 (d) y 3 — xy — x 

(e) y 3 - y 2 - x (f) }； 3 - x(x - 1) (g) y 3 - x(x - l) 2 (h) 〆 + 巧 2 + x 
(i) x 2 y 2 — xy — x 

6* (a) Determine the number of isomorphism classes of function fields K of degree 3 over 
F = C(jc) which are ramified only at the points ±1. 

(b) Describe the gluing data for the Riemann surface corresponding to each isomorphism 
class of fields as a pair of permutations. 

(c) For each isomorphism class，determine a polynomial/( jc,} ；) such that K = F[x]/(f) 
represents the isomorphism class. 

*7, Prove the Riemann Existence Theorem for quadratic extensions, 

*8* Let 5 be a branched covering constructed with branch points ai”. ，， a r ，curves 
C u …， C r ， and permutations ， .•. ， o>. Prove that S is connected if and only if the sub¬ 

group 2 of the symmetric group S n which is generated by the permutations a v operates 
transitively on the indices 1 ， ". ， n. 

*9* It can be shown that the Riemann surface 5 of a function field is homeomorphic to the 
complement of a finite set of points in a compact oriented two-dimensional manifold S t 
The genus of such a surface is defined to be the number of holes in the corresponding 
manifold S. So if S is a sphere, the genus of S is 0, while if 5 is a torus，the genus of S is 
1. The genus of a function field is defined to be the genus of its Riemann surface. Deter¬ 
mine the genus of the field defined by each polynomial. 

(a) y 2 — (x 2 — l)(x 2 — 4) (b) y 2 — x(x 2 — 1)(jc 2 一 4) (c) y 3 + y + x 

(d) y 3 — x(x — 1) (e) y 3 — x(x - l) 2 

8. Transcendental Extensions 


1. Let K = F(a) be a field extension generated by an element a, and let ^8 E /3 E F. 
Prove that a is algebraic over the field F(/3). 

2* Prove that the isomorphism Q(tt) - >Q(e) sending is discontinuous. 

3. Let F C K C Lbc fields. Prove that tr degF^ = tr degF 尤 + tr degA：L . 

4* Let (ai ，， " ， o：”）C AT be an algebraically independent set over F，Prove Chat an element 
(3 S K is transcendental over F(ai， …， ％) if and only if (o^ ， … ， a^;#) is algebraically 
independent. 

5* Prove Theorem (8,3)_ 

9. Algebraically Closed Fields 

1* Derive Corollary (9.5) from Theorem (9.4). 

2* Prove that the field F constructed in this text as the union of finite fields is algebraically 
closed. 

*3* With notation as at the end of the section，a comparison of the images/(CV) for varying 
radii shows another interesting geometric feature: For large r, the curve f(C r ) has n 
loops. This can be expressed formally by saying that its total curvature is 2m For small 
r，the linear term a\z + ao dominates f(z ). Then/(CV) makes a single loop around Its 
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total curvature is only 2tt. Something happens to the loops and the curvature, as r varies. 
Explain. 

*4. If you have access to a computer with a good graphics system, use it to illustrate the vari¬ 
ation off(C r ) with r. Use log-polar coordinates (log r, 6). 

Miscellaneous Exercises 

1. Let f(x) be an irreducible polynomial of degree 6 over a field F, and let AT be a quadratic 
extension of F. Prove or disprove: Either/is irreducible over K, or else/is a product of 
two irreducible cubic polynomials over K. 

2. (a) Let p be an odd prime. Prove that exactly half of the elements of f p x are squares 

and that if are nonsquares, then is a square. 

(b) Prove the same as (a) for any finite field of odd order. 

(c) Prove that in a finite field of even order, every element is a square, 

3. Write down the irreducible polynomial for a = V2 + V3 over Q and prove that it is 
reducible modulo p for every prime p. 

*4. (a) Prove that any element of GL 2 {I) of finite order has order 1 ， 2, 3,4, or 6, 

(b) Extend this theorem to GL 3 (/), and show that it fails in GL 4 (Z). 

5. Let c be a real number, not ±2. The plane curve C: x 2 + cxy + y 2 = \ can be 
parametrized rationally. To do this, we choose the point (0,1) on C and parametrize the 
lines through this point by their slope: L t :y = tx + l. The point at which the line L t in¬ 
tersects C can be found algebraically* 

(a) Find the equation of this point explicitly. 

(b) Use this procedure to find all solutions of the equation x 2 + cxy + y 2 — 1 in the 
field F = ¥ p , when c is in that field and c 丰 ±2. 

(c) Show that the number of solutions is p — 1 ， p ， or p + 1, and describe how this 
number depends on the roots of the polynomial t 2 + ct + L 

6. The degree of a rational function/( jc) = p(x)/q(x) E C(jc) is defined to be the maxi¬ 

mum of the degrees of p and q, when p, q are chosen to be relatively prime. Every ratio¬ 
nal function / defines a map P f - \ by x^^f(x). We will denote this map by/ 

too, 

(a) Suppose that / has degree d. Show that for any point y 0 in the plane, the fibre f~ l (yo) 
contains at most d points. 

(b) Show that f ~ 1 (^o) consists of precisely d points, except for a finite number of y 0 . 
Identify the values y 0 where there are fewer than d points in terms off and df fdx. 

*7. (a) Prove that a rational function /(x) generates the field of rational functions C(jc) if and 
only if it is of the form (ax + b)/(cx + d), with ad — be + 0. 

(b) Identify the group of automorphisms of C(x) which are the identity on C. 

*8. Let K/F be an extension of degree 2 of rational function fields，say K = C(/) and 
F = C(x), Prove that there are generators x f for the two fields，such that 
t = (at f + + 8) and jc = {ax f + b)/(cx r + d), a ， j5 ， y ， 8 ， a ， b ， c ， d&C ， 

such that t n = 

*9. Fill in the following outline to give an algebraic proof of the fact that K = 
C{x)[y]/(y 2 — x 3 + x) is not a pure transcendental extension of C. Suppose that K = 
C(t) for some t. Then jc and y are rational functions of t. 
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(a) Using the result of the previous problem and replacing r by f' as necessary, reduce to 
the case that x = {at 2 + b)/(ct 2 + d). 

(b) Say that y = p{t)/q{t). Then the equation y 2 = x(x + l)(x - 1) reads 

p(t) 2 _ (at 2 + b)((a + c)t 2 + b + d)((a — c)t 2 + b — d) 
q{t) 2 (ct 2 + d) 3 _ 

Either the numerators and denominators on the two sides agree，or else there is can¬ 
cellation on the right side. 

(c) Complete the proof by analyzing the two possibilities given in (b). 

*10 •⑻ Prove that the homomorphism SL 2 (Z) - >SL 2 (¥ P ) obtained by reducing the matrix 

entries modulo 2 is surjective. 

(b) Prove the analogous assertion for SL n . 

*11. Determine the conjugacy classes of elements order 2 in GL n (Z ). 
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Galois Theory 


En un mot les calculs sont impraticables• 

Evariste Galois 


h THE MAIN THEOREM OF GALOIS THEORY 

In the last chapter we studied algebraic field extensions, using extensions generated 
by a single element as the basic tool. This amounts to studying the properties of a 
single root of an irreducible polynomial 

(1.1) f(x) = + … + a 0 . 

Galois theory，the topic of this chapter, is the theory of all the roots of such a poly- 
nomiaJ and of the symmetries among them. 

We will restrict our attention to fields of characteristic zero in this chapter. It 
is to be understood that all fields occurring have characteristic zero, and we will not 
mention this assumption explicitly from now on. 

The notation K/F will indicate that K is an extension field of F. This notation 
is traditional，though there is some danger of confusion with the notation R/I for the 
quotient of a ring R by an ideal L 

As we have seen, computation in a field F(a) generated by a single root can 
easily be made by identifying it with the formally constructed field F[x]/(f). But 
suppose that an irreducible polynomial/( jc) factors into linear factors in a field exten¬ 
sion K, and that its roots in AT are ⑺，…， . How to compute with all these roots at 
the same time isn’t clear. To do so we have to know how the roots are related，and 
this depends on the particular case. In principle, the relations can be obtained by ex¬ 
panding the equation f(x) = (x - ai)(x - a 2 )^-(x - a n ). Doing so, we find that 
the sum of the roots is —a n -u that their product is 士 a 0 , and so on. However, it may 
not be easy to interpret these relations directly. 
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The fundamental discovery which arose through the work of several people ， 
especially of Lagrange and Galois，is that the relationships between the roots can be 
understood in terms of symmetry. The original model for this symmetry is complex 
conjugation, which permutes the two roots ±i of the irreducible real polynomial 
;c 2 + 1 ， while leaving the real numbers fixed. We will begin by observing that such a 
symmetry exists for any quadratic field extension. 

An extension K/F of degree 2 is generated by any element a of K which is not 
in F, Moreover, a is a root of an irreducible quadratic polynomial 

(1.2) f(x) = x 2 + bx + c 

with coefficients in F. Then a r = -b — a is also a root of /， so this polynomial 
splits into linear factors over K: f(x) = (x — a)(x — a 

The fact that a and a f are roots of the same irreducible polynomial provides 
us with our symmetry. According to Proposition (2.9) of Chapter 13, there is an iso¬ 
morphism 

(1.3) a: F(a) ― >F(a f ), 


which is the identity on F and which sends \ But either root generates the 

extension: F(a) ~ K = F(a f ). Therefore a is an automorphism of K. 

This automorphism switches the two roots a, a \ For, since a is the identity 
on F, it fixes b, ^nd a + a r ~ b. So if a (a) = a \ we must have a(a f ) = a. It 
follows that cr 2 sends (X A/VW^ and，since a generates K over F, that cr 2 is the iden ， 
tity. 

Note also that cr is not the identity automorphism, because the two roots a,a f 
are distinct. If a were a double root of the quadratic polynomial (1,2)，the quadratic 
formula would give a — ~\b. This would imply a E F ， contrary to our hypothesis 
that / is irreducible. 


Since our field F is assumed to have characteristic zero, the quadratic extension 
K can be obtained by adjoining a square root 8 of the discriminant D = b 2 — 4c, a 
root of the irreducible polynomial x 2 — D. Its other root is —S ，and cr interchanges 
the two square roots. 

Whenever K is obtained by adjoining a square root 8, there is an automor¬ 
phism which sends 8^^ -8. For example，let a = 1 + V2, and let K = Q(a). 
The irreducible polynomial for a over Q is jc 2 — 2x — 1, and the other root of this 
polynomial is a' = 1 — V2. There is an automorphism cr of 欠 which sends 

-V2 and \ It is important to note right away that such an auto¬ 

morphism will not be continuous when K is considered as a subfield of U. It is a 
symmetry of the algebraic structure of K, but it does not respect the geometry given 
by the embedding of K into the real line. 

By definition，an F-automorphism of an extension field K is an automorphism 
which is the identity on the subfield F [see Chapter 13 (2,10)]. In other words, an 
automorphism cr of is an F-automorphism if cr{c) — c for all c E ： F. Thus com¬ 
plex conjugation is an [R-automorphism of C, and the symmetry cr we have just 
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found is an F-automorphism of the quadratic extension K. It is not difficult to show 
that a is the only F-automorphism of this extension other than the identity. 

The group of all /^-automorphisms of AT is called the Galois group of the field 
extension. We often denote this group by G(K/F). When K/F is a quadratic exten¬ 
sion, the Galois group G(K/F) is a group of order 2, 

Let us now consider the next simplest example, that of a biquadratic extension. 
We will call a field extension K/F biquadratic if [K:F] = 4 and if K is generated by 
the roots of two irreducible quadratic polynomials. Every such extension has the 
form 


(1.4) K = F(a ? /3), 

where a 2 = a and /3 2 = b, and where a, b are elements of F. The element j8 gener¬ 
ates an intermediate field — a field F(/3) between F and K. Since K = F(a，/3)， the 
requirement that [K:F] = 4 implies that F(/3) has degree 2 over F and that a is not 
in the field F(P). So the polynomial jc 2 — a is irreducible over F(/3). Similarly, the 
polynomial x 2 - b is irreducible over the intermediate field F(a). 

Notice that K is an extension of F(f3) of degree 2, generated by a . Let us apply 
what we have just learned about quadratic extensions to this extension. Substituting 
F(/3) for F, we find that there is an F(j3)-automorphism of K which interchanges the 
two roots ±a of jc 2 — 仏 Call this automorphism cr. Since it is the identity on 
a is also the identity on F, so it is an /^automorphism too. Similarly, there is an 
F(a )- automorphism r of AT which interchanges the roots 士 /3 of x 2 — b， and r is also 
an F-automorphism. 

The two automorphisms we have found operate on the roots a ? /3 as follows: 


(1.5) 




a 


P 


AAAAA> 



T 


Composing these operations, we find that err changes the signs of both roots a,/3 
and that the automorphisms cr 2 , 丁 2 , and errerr leave a and /3 fixed Since K is gener¬ 
ated over F by the roots，these last three automorphisms are all equal to the identity. 
Therefore the four automorphisms {1 ， ct ， t ， (tt} form a group of order 4, with rela¬ 
tions 

cr 2 = 1 ， t 2 = 1 ， err = rcr. 


We have shown that the Galois group G(K/F) contains the Klein four group. In fact 
it is equal to that group, as we shall see in a moment. 

For example, let F = Q, a = and /3 = V2, so that K = Q(/, V2). In this 
case, the automorphism cr is complex conjugation, while r sends v2^V2, 
fixing /, 

For quadratic or biquadratic extensions, the degree [K : F] is equal to the order 
of the Galois group G(K/F). We will now state two theorems, Theorems (1.6) and 
(1,11)，which describe the general circumstances under which this happens. These 
theorems will be proved in later sections of the chapter. 
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(1.6) Theorem. For any finite extension K/F, the order \G(K/F) | of the Galois 
group divides the degree [K : F] of the extension. 

A finite field extension K/F is called a Galois extension if the order of the Ga¬ 
lois group is equal to the degree: 

(1.7) IG (K/F) I = [K : FI 

Theorem (1.6) shows that the Galois group of a biquadratic extension has order at 
most 4. Since we already have four automorphisms in hand, there are no others, and 
the Galois group is the Klein four group, as was asserted. All quadratic and bi¬ 
quadratic extensions are Galois. 

If G is a group of automorphisms of a field K, the set of elements of K which 
are fixed by all the automorphisms in G forms a subfield，called the fixed field of G. 
The fixed field is often denoted by K G : 

(1.8) K G = {a E K\(p{a) - a for all ^ S G}. 

One consequence of Theorem (1.6) is that when K/F is a Galois extension, the only 
elements of K which are fixed by the whole Galois group are the elements of F: 

(1.9) Corollary, Let K/F be a Galois extension，with Galois group G = G(K/F). 
The fixed field of G is F. 

For let L denote the fixed field. Then F C L, and this inclusion shows that every L- 
automorphism of K is also an F-automorphism, that is，that G(K/L) C G. On the 
other hand, by definition of the fixed field，every element of G is an L-automor¬ 
phism, So G(K/L) = G. Now |G| = [K : F] because K/F is a Galois extension, 
and by Theorem (1.6), \G\ divides [K : L]. Since F C L C K, this shows that 
[K : F] = [K : L], hence that F = L. u 

This corollary is important because it provides a method for checking that an ele¬ 
ment of a Galois extension K is actually in the field F. We will use it frequently. 

Being Galois is a strong restriction on a field extension，but nevertheless there 
are many Galois extensions. This is the key fact which led to Galois’ theory. In or¬ 
der to state the theorem which describes the Galois extensions, we need one more 
definition. 

(1.10) Definition. Lctf(x) E F[x] be a nonconstant monic polynomial. A split¬ 
ting field for/(jc) over F is an extension field K of F such that 

(i) f(x) factors into linear factors in K: f(x) = (x-a{) -- (x—a n ), with a/ E AT; 

(ii) K is generated by the roots of/(jc): K = 尸 (a!，•.•，％). 

The second condition just says that K is the smallest extension of F which contains 
all the roots. The biquadratic extension (L4) is a splitting field of the polynomial 
f(x) = (x 2 - a)(x 2 - b). 
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Every polynomial/( jc) E. F[x] has a splitting field. To find one, we choose a 
field extension L in which/splits into linear factors [Chapter 13 (5.3)] and then take 
for K the subfield F(a\ ，…， of L generated by the roots. 

(1.11) Theorem. If 尺 is a splitting field of a polynomial f(x) over F, then K is a 
Galois extension of F. Conversely, every Galois extension is a splitting field of some 
polynomial/^) E F[x]. 

(1.12) Corollary. Every finite extension is contained in a Galois extension. 

To derive this corollary from the theorem, let K/F be a finite extension，let 
， … ， o; rt be generators for K over F, and be the monic irreducible polyno¬ 
mial for at over F. We extend K to a splitting field L of the product/ = … over 

K. Then L will also be a splitting field of / over F. So L is the required Galois exten¬ 
sion, □ 

(1.13) Corollary • Let K/F be a Galois extension, and let L be an intermediate 
field: F C L C K. Then K/L is a Galois extension too. 

For，if 欠 is the splitting field of a polynomial /(jc) over F, then it is also the splitting 
field of the same polynomial over the larger field L，so AT is a Galois extension of 

L. □ 


Let us go back to biquadratic extensions. We can prove that the Galois group of 
such an extension has order 4 without appealing to Theorem (1,6)* All that is needed 
is the following elementary proposition; 

(1.14) Proposition. 

(a) Let K be an extension of a field F, \ctf(x) be a polynomial with coefficients in 
F ，and let cr be an /^automorphism of K. If a is a root of f(x) in K, then cr(a) 
is also a root. 

(b) Let 尤 be a field extension generated over F by elements ai ”. ■ ， , and let cr be 
an F-automorphism of 尺 ，If cr fixes each of the generators then cr is the 
identity automorphism. 

(c) Let 欠 be a splitting field of a polynomial f(x) over F. The Galois group 
G(K/F) operates faithfully on the set {ai,...,a w }. 

Proof ， Part (a) was proved in the last chapter [Chapter 13 (2,10)]. To prove 
part (b)，assume that K is generated by Then every element of K can be 

expressed as a polynomial in ai,.,.,a rt with coefficients in F [Chapter 13 (2.6b)]. If 
cr is an automorphism which is the identity on F and which also fixes each of the el¬ 
ements a“ then it fixes every polynomial in {a^} with coefficients in F; hence it is the 
identity. The third assertion (c) follows from the first two: The first tells us that ev¬ 
ery cr E G (K/F) permutes the set {ai ， … ， a n }, and the second tells us that the oper¬ 
ation on this set is faithful. □ 
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Proposition (1.14) does not address the most interesting question: Which per¬ 
mutations of the roots of a polynomial extend to automorphisms of the splitting field? 
This question is the central theme of Galois theory. 

Let us apply Proposition (1.14) to the biquadratic extension (1,4), Part (a), ap¬ 
plied to the polynomial x 2 — a, shows that any T 7 ,automorphism (p of K permutes 
the roots 土 a. Similarly, (p permutes ±/3. Only four permutations of {±a ， ±/3} act 
in this way. Since the elements a，/3 generate K, (1.14b) tells us that an F-automor- 
phism which fixes both of them is the identity* So the four automorphisms which we 
have already found are the only ones. This proves that G{K/F) is the Klein four 
group. 

One of the most important parts of Galois theory is the determination of the 
intermediate fields L ，those sandwiched between F and K : F C L C K. The Main 
Theorem of Galois theory asserts that when K/F is a Galois extension, the interme¬ 
diate fields are in bijective correspondence with the subgroups of the Galois group. 
The importance of this correspondence is not immediately clear. We will have to see 
it used to understand it. 

The intermediate field corresponding to a subgroup H of G(K/F) is the fixed 
field K h of H ，which was defined above. In the other direction, if L is an intermedi¬ 
ate field，the Galois group G(K/L) is a subgroup of G(K/F). This is the subgroup 
which corresponds to L. 


(1.15) Theorem, The Main Theorem: Let AT be a Galois extension of a field F, and 
let G = G(K/F) be its Galois group. The function 


is a bijective map from the set of subgroups of G to the set of intermediate fields 
F C L C AT. Its inverse function is 


L /vwv ^ G{K/ L), 

This correspondence has the property that if H = G(K/L), then 
(1.16) [K : L]^\H I hence [L : F] ^ [G : H\ 

We will prove this theorem in Section 5. 


The fields F and K are included among the intermediate fields. The subgroup 
which corresponds to the field F is the whole group G [see (1,9)], and the one corre¬ 
sponding to K is the trivial subgroup {1}. 

Let us go back to our example of the biquadratic extension K = Q(i, V2), for 
which a is complex conjugation, while r interchanges V2^vw^ -V2. Its Galois 
group, the Klein four group, has three proper subgroups: 

Hi = {l,cr}, H 2 = {1,t}, H 3 = {l ， or}. 

According to the Main Theorem, there are three proper intermediate fields，namely 
the fixed fields L ( * of these subgroups. They are easily determined: 

Li = Q(V2), L 2 = O(i), and = Q(/V2). 
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A Galois group is finite, so it has finitely many subgroups. But without the 
Main Theorem, it isn’t obvious that there are only finitely many intermediate fields. 
It might seem natural to expect two randomly chosen elements of a Galois extension 
K/F to generate different subfields. This tends not to happen，and in fact most ele¬ 
ments will generate the whole extension K. The case of the biquadratic extension 

K = Q(i f V2) wiJJ illustrate this point. Let y be any element of K. The field Q(y) 
generated by y must be one of the intermediate fields we have found. So if y is not 
contained in <□(/) ， Q(V2), or Q(/V2) 5 then Q(y) = K. Now the set 
(1 ， / ， V2 ? / V2) is a basis for K over F, so we may write an arbitrary element y in 
the form 

y = C\ + Cii + C 3 V 2 + c 4 /V2, with G O. 

This element is not in one of the three proper intermediate fields unless two of the 
coefficients c 2 ,c 3 ,c 4 are zero. The element i + V2 ? for example, generates the 
whole extension K. We will return to this point in Section 4, 

2. CUBIC EQUATIONS 

Having examined biquadratic extensions in the last section, we now turn to the next 
general class of examples ? the splitting fields of cubic polynomials. Cubic equations 

(2.1) f(x) = jc 3 + a 2 x 2 + a\X + ao = 0 

were solved explicitly in terms of square roots and cube roots in the sixteenth cen¬ 
tury by the mathematicians Tartaglia and Cardano. We will begin by reviewing their 
remarkabJe ad hoc solution. 

The computation is simpler when the coefficient of degree 2 in f(x) vanishes. 
The quadratic term in our general equation (2.1) can be eliminated by the substitu¬ 
tion 

(2.2) x = X\ - a 2 /3. 

Let us write a cubic whose quadratic term vanishes as 

(2.3) f(x) = x 3 + px + q ， 

where the coefficients p, q are elements of the field F. Cardano’s solution of the 
equation / = 0 starts with the substitution x = u — v. Collecting terms in 
f(u - I?), we find 

f(u — v) — (u 3 —v 3 ) — (3uv~ p)(u—v) + q. 

The point of replacing the variable jc by a sum of variables is that we can now split 
our equation apart. Clearly, f(u — v) = 0 if the two equations 

3uv — /? = 0 ， u 3 — v 3 + q = 0 

hold. And since we have two variables, we may hope to obtain solutions to such a 
pair of equations, though it isn’t clear a priori that this will help. We solve the first 
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equation for v = p/3u and substitute into the second. Clearing the denominator 
gives 

3 3 m 6 — /? 3 + 3 3 u 3 q = 0. 

Miraculously，this equation is quadratic in m 3 . Setting j = m 3 ? it reduces to 


(2.4) 3 3 j 2 + 3 3 qy - p 3 = 0. 

This equation can be solved by the quadratic formula: 



We will be able to prove the existence of a solution of this general type later, with¬ 
out explicit computation [see (7.6)]. 

Let us now examine the Galois theory of an irreducible cubic polynomial/^). 
We may assume thatf(x) has the form (2.3), Let ^ be a splitting field of f(x) over F, 
and let a 2 ? a 3 be the three roots off(x) in K, ordered in an arbitrary way, so that 

(2.7) f(x) = x 3 +px+q = {x—ai)(x—a 2 )(x—a 3 }. 

Expanding the right side of this equation，we obtain the relations 

Q：1 + Q：2 + = 0 

(2.8) aia 2 + + a\a^ = p 


aia 2 a 3 = ~q. 

The first of these relations shows that the third root a 3 is in the field generated by the 
first two roots. Thus we have a chain of fields 


F C F(ai) C A：, 

and K = Fia^ai) = F(a u a 2 ,a 3 )^ Let us denote F(ai) by L. There are two funda¬ 
mentally different cases which may arise, namely either 

(2,9) L = K or L < K. 


In terms of the roots，the first case occurs when the last two roots a 2 and can be 
expressed in terms of a\ and elements of F, that is, if they can be written as polyno¬ 
mials in a\ with coefficients in F [see Chapter 13 (2.6)], The second case occurs 


when the last two roots can not be expressed in this way. 


Q；i 


For example，let 


f(x) = x 3 — 2. The three roots of this polynomial are 
o ?3 = { 2 ^/2 f where ^/2 denotes the real cube root of 2 and 


(= e 2m//3 . Since oil is real，the field Q(a\) is contained in [R, It doesn’t contain the 
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other two roots，which are complex. Hence if F = Q and L = Q(ai), we are in the 
second case. On the other hand，if we let F = Q(f), then F{a\) contains a 2 , so we 
are in the first case. 


To analyze the dichotomy (2.9), we consider the way the irreducible polyno¬ 
mial/^) factors in the field L . By assumption ， /(jc) is irreducible in F[x\ and it fac¬ 
tors into linear factors in K[x]. In the ring L[x] y f(x) has the factor (x — a\): 

( 2 . 10 ) f(x) = (x - ai)h(x), 

where h(x) is a quadratic polynomial with coefficients in L. Division by x - a\ 
gives the same result if it is carried out in the larger field AT, Looking at (2.7 )， we 
see that h(x) = (x — a 2 )(x — a 3 ) in K[x] t Therefore L < K if and only if h(x) is 
irreducible over L . In this case，the degree of L(a 2 ) = K over L is 2. Also, since we 
assume/^) irreducible over F,[L : F] = 3 in either case. So we have 


( 2 . 11 ) 


[K: F]= 


3 if L = /s ： 
6 ifl < /T 


(2.12) Example. The polynomial f(x) = jc 3 + 3x + 1 is irreducible over Q, and it 
has only one real root. To see that there is only one real root，we note that the 
derivative of / does not vanish on the real line. Therefore f(x) defines an increasing 
function of the real variable x. It takes the value 0 only once. The real root does not 
generate the splitting field K ， which also contains two complex roots. So 
: Q] = 6 in this case. 

On the other hand, the splitting field of the polynomial f(x) = x 3 — 3 jc + 1 
over Q has degree 3 . One of its roots is 171 = 2 cos 2 tt /9 = ^ + where 
《 =e 2m ^ 9 . Having the polynomial in hand, we can check this directly. But actually ， 
we made this example by computing the irreducible polynomial for 171 over <□, The 

•m 

way to compute this polynomial is to guess its other roots. We note that 171 is the 
sum of a ninth root of 1 and its inverse. There are two other sums of this sort: 
172 = + C 1 and 7/3 = ( 4 + (' We guess that these are the other roots and expand 
(x — 7 ]\)(x — t] 2 )(x — obtaining/. In this example, 172 happens to be equal to 
7]\ 2 — 2 , and 173 = -171 — 172. So K = □ 


We go back to a general cubic equation. According to Theorem (Lll), the or¬ 
der of the Galois group G = G(K/F) is the degree of the field extension [K : F]. 
For cubic equations, this degree determines the group G completely. Namely, Propo¬ 
sition (1.14) tells us that G operates faithfully on the set {ax , a 2 , a 3 } of roots. These 
roots are distinct [Chapter 13 (5,8)]. So G is a subgroup of the symmetric group 5 :'， 
which has order 6 . If [K : F] = 6 , then G is the whole symmetric group. In this 
case any permutation of the roots is realized by an F-automorphism of K. On the 
other hand, the only subgroup of S 3 of order 3 is the alternating group A 3 , a cyclic 
group. So if [K : F] = 3, then G = A 3 . In this case the cyclic permutations and the 
identity are the only ones which extend to F-automorphisms. Thus the roots of an ir¬ 
reducible cubic polynomial may have either dihedral or cyclic symmetry. But these 
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symmetries are algebraic; they will not be symmetries of K when this field is viewed 
as a set of points in the complex plane. 

Let us determine the intermediate fields in the case that the degree [K : F] is 
6. (There are no intermediate fields properly between F and K when [K : F] — 3.) 
The symmetric group S 3 has three conjugate subgroups of order 2 and one subgroup, 
As, of order 3. There are three obvious intermediate fields: F(a\), F(a 2 ), F(a 3 ). 
They are isomorphic but not equal subfields of K, and they correspond to the three 
subgroups of order 2. But the intermediate field which corresponds to the subgroup 
A 3 is not obvious. Let us denote this mystery field by L , According to the Main The¬ 
orem, G(K/L) = Hence [K : L] = 3 and [L : F] = 2. So L is a quadratic ex¬ 
tension of F, which can be obtained by adjoining a square root. The Main Theorem 
has told us an interesting fact: K contains the square root 5 of an element of F, And 
since there is only one intermediate extension of degree 2, this square root is essen¬ 
tially unique. The Main Theorem also tells us that L is the fixed field of the subgroup 
A 3 . So an even permutation of the roots leaves 8 fixed, while an odd permutation 
does not. The required element is 

(2.13) 8 = {a\ ~ aj2)(ai — 03)(«2 — ol^). 

A permutation of the roots multiplies 8 by the sign of the permutation. Hence 8 is 
not fixed by all elements of G(K/F) = 5 3 , so 5 £ F. But d 2 is fixed by every per¬ 
mutation. Corollary (1.9) tells us that 8 2 E F. 

For any cubic polynomial/(x) = (x - ai)(x - a 2 )(x - as), the element 

(2.14) £> = («]- «2) 2 (o：i - a 3 ) 2 (a；2 - a 3 ) 2 

is called the discriminant of the polynomial. It is an element of the field F which is 
zero if and only if two roots of f(x) are equal. So it is analogous to the discriminant 
of the quadratic polynomial x 2 + + c = (x — ai)(jc — a 2 ), which is b 2 — Ac = 

(a\ — a 2 ) 2 - If the cubic / is irreducible, then its roots are distinct, hence £)^0. 

The fact that the discriminant of the cubic polynomial is an element of F fol¬ 
lows from Corollary (1.9)，but it is not trivial. We will prove it abstractly in the next 
section，but it can also be checked by direct calculation. Using formulas (2.8)，we 
can compute the discriminant in terms of the coefficients p ， q. It is 

(2.15) D = -4p 3 — 21 q 2 . 

(2.16) Proposition. The discriminant of an irreducible cubic polynomial 
f(x) E F [ 义 ] is a square in F if and only if the degree of the splitting field is 3, 

If we choose a polynomial with integer coefficients at random, the chances are 
good that its discriminant will not be a square in Q, For example, the discriminant 
ofx 3 + 3 jc + 1 is -135. On the other hand, the discriminant of jc 3 — 3jc + 1 is 81 ， 
a square. This agrees with the fact that [K : F] = 3 [see (2.12)]. 

Proof of the Proposition. If D is not a square, then 5 S F, and therefore 
[F(8) : F] = 2. Since 8 & K,[K : F] is divisible by 2, hence by (2.11), [K : F] = 
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6. On the other hand, if 5 E then every element of the Galois group 
G = G(K/F) fixes 8, Since odd permutations change the sign of 8, they are not in 
G, and hence G 羊 Sy Therefore [K : F] = 3.n 


How could such a proposition be true? There must be a formula which ex¬ 
presses the second root a 2 in terms of the elements ai ， 5, and the coefficients p ， q* 
This formula exists，and it is instructive to compute it explicitly. 


3. SYMMETRIC FUNCTIONS 

Galois theory is concerned with the problem of determining those permutations of 
the roots of a polynomial which extend to field automorphisms. In this section we ex¬ 
amine a simple situation in which every permutation extends, namely when the roots 
are independent variables. 

Let R be any ring, and consider the polynomial ring /? [Wi ，…， in ai variables 
Ui. A permutation cr of {1, …， it} can be made to operate on polynomials, by permut¬ 
ing the variables. We must decide here how we want permutations to operate. Let us 
keep automorphisms on the left. Then a operates by the inverse permutation on the 
indices: 

(3.1) f = f(u u ..^U n ) Mncr-0 = 

This is clearly an automorphism of R[u]. Since it acts as the identity on R, a is 
called an R-automorphism. So the symmetric group S n operates by /?- automorphisms 
on the polynomial ring R[u]. A polynomial is called symmetric if it is left fixed by 
all permutations. 

It is easy to describe the symmetric polynomials. In order for g to be symmet¬ 
ric, two monomials in {wi， …， which differ by a permutation of the indices，such 
as U\ 2 u 2 and m 2 2 M 3 , must have the same coefficients in g, A symmetric polynomial 
which involves a given monomial must include the whole orbit. Thus 

g(u) = (Mi 3 + M 2 3 + M 3 3 ) + 5(Wi 2 M 2 + Wl 2 M3 + W2 2 M3 + W 2 2 Ml + W3 2 W2+W3 2 Wl) - U\U 2 U^ 

is a symmetric polynomial of degree 3 in three variables. 

There are n special symmetric polynomials with integer coefficients, called the 
elementary symmetric functions Si ： 

(3.2) 心 =Ml + 

^2 = U\U 2 + W 1 W 3 + … + Un-\U n = 2 u i u J 

i<] 

ft = 2 M| Uj Uk 

i<j<k 


S n = MiM 2 
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They are the coefficients of the polynomial {x - U\)(x — u 2 ) t -(x - u n ) when it is 
expanded as a polynomial in jc: 

(3.3) p(x) = (x — U\)(x — U 2 ) 9 (x — Un) = x n — S\X n ~ l + s 2 x n ~ 2 — ± s n . 

We have reversed the order of the indices and alternated the sign here. The 
coefficients Si are symmetric because p (x) is symmetric with respect to permutation 
of the indices. 

The main theorem on symmetric functions asserts that the elementary symmet¬ 
ric functions generate the ring of all symmetric polynomials: 

(3.4) Theorem. Every symmetric polynomial g(Ui, … ， u n ) G R[u] can be written 

in a unique way as a polynomial in the elementary symmetric functions 心 ，…， 〜.In 
other words, let zi ? ... 5 be variables. For each symmetric polynomial g(u), there is 
a unique polynomial z„) E [zi ，…，〜] such that 

g(ui, t .^u n ) = …， 〜)， 

The proof of this theorem is at the end of the section* 


For example ， 


(3.5) 



+ Un — S\ 2 — 2 s 2 . 


The discriminant of the polynomial /?(x) (3.3)，defined to be 

D = (u { - U 2 ) 2 (Ui - w 3 ) 2 … （⑹ -1 - Un ) 2 

( 3 . 6 ) = n (W/ - Ujf = ±H(Ui - My), 

is perhaps the most important symmetric polynomial. Both of the last two expres¬ 
sions for the discriminant are convenient at times, so it is unfortunate that they may 
differ by a sign. To go from the second expression for D to the last one requires 
\n{n — l) sign changes, so the correct sign to replace the symbol 土 is 

(3.7) (-i)m /2 . 


It is clear that D is a symmetric polynomial with integer coefficients. So Theo¬ 
rem (3,4) tells us that it can be written as an integer polynomial in the elementary 
symmetric functions. In other words，there exists a polynomial 

(3 . 8) A(zi , • • • ， z”）£ i ,..., Zfi\ 

so that D = A(Si ， …， 知 ). Unfortunately, this expression for D in terms of the ele¬ 
mentary symmetric functions is very complicated, I don’t know what it is for 
n > 3. 


We can compute the discriminant for n 


= 2 easily 


* 


( 3 . 9 ) 


(Ml — u 2 ) 2 = Si 2 - 4s 2 . 
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This is the familiar formula for the discriminant of the quadratic polynomial 
p(x) = x 2 — 5ijc + ^2 - When n = 3, the expression for the discriminant is already 
too complicated to remember; 

(3.10) 

(Ml — U 2 ) 2 (Ui - U 3 ) 2 (u 2 - u 3 ) 2 = W — 4s 2 3 — 4?1 3 幻 一 27s 3 2 + 18*^253 • 

It is important to note that such an expression is an identity in Z[«i ， …， It 
remains true when substitutions are made for the variables ui. If we are given partic¬ 
ular elements { ⑷， … ， q^} in a ring R, we can expand the polynomial obtained by 
substituting at for m, in p(x): 

(x — a\)(x — o ： 2 ) …( 又 —a«) = x n — b\X n ~ l + b 2 x n ^ 2 —… ± b n . 

The indices and the signs have been adjusted to agree with (3.3). Then 

bi = Si(a\ ，…， a«), 

and 


ll(a/ - OLj ) 2 = 

i<j 

This follows by substitution of a, for w/ • 

It is also important that the expression of a symmetric polynomial in terms of 
the elementary symmetric functions is unique: 

(3*11) Corollary • There are no polynomial relations among the elementary sym¬ 
metric functions iSl ， • • • ， Sn . Equivalently, the subring /?[& ， … ，心 ] of R[u] generated 
by {si} is isomorphic to the polynomial ring R[zu.^,z n ] in n variables ‘ 

This is a restatement of the uniqueness in Theorem (3.4). □ 

The corollary can be used in the following way: Let 

(3.12) f(x) — x n — a\x n ~ l + a 2 义 n_2 —…土 a” 

be a polynomial with coefficients in a ring R. We define the discriminant of f(x) to 
be the element A(ai ， ." ， a rt ) of /?， where A(zi ， ." ， z / 2 ) is the polynomial (3.8). Since 
this polynomial is unique, the discriminant is defined, whether the polynomial is a 
product of linear factors in R [x] or not. 

For example，let n = 3. Then formula (3.10) shows that 

(3.13) A(0, p, ~q) - ~4p 3 - T!q 2 , 

which agrees with the formula (2.15) for the discriminant of the cubic polynomial 

x 3 + /?jc + q . 

We can use undetermined coefficients to compute the expression of a symmet¬ 
ric polynomial in terms of the elementary symmetric functions. To apply this 
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method, we notice that the elementary symmetric function Si has degree i in the 
variables u. That is why we chose the index i for it. So we assign the weight i to the 
variable z/，and we define the weighted degree of a monomial z { ei z 2 e2 … to be 

(3.14) €\ + 2^2 + … + tl€n . 

Substitution of Si for z ； into a polynomial of weighted degree d in z yields a polyno¬ 
mial of (ordinary) degree d in 

For example, to compute the discriminant of a cubic polynomial in terms of 
the elementary symmetric functions, we notice that its degree in w is 6 , There are 
seven monomials in zi, z 2 , Z 3 of weighted degree 6 : 

(3.15) Zi 6 ， Zi 4 Z 2 , Z! 3 Z 3 ? 2 \ 2 Zi, Z 1 Z 2 Z 3 , Z 2 3 ， Z 3 2 , 

So D is a linear combination of these monomials. To compute its coefficients，we 
evaluate D on some special polynomials: Setting /(jc) = x 2 (x — 1)，we get D — 0, 
S) = l, and s 2 = S 3 = 0. Since the only one of the monomials (3.15) which does 
not involve z 2 or Z 3 is zi 6 ，the coefficient of z\ in the discriminant is zero. The 
coefficients of z 2 3 and z 3 2 can be computed using the special polynomials jc 3 — jc and 
x 3 — 1 , for example. 

Proof of Theorem (3 A). Let’s warm up by working out the case of the sym¬ 
metric polynomial 

f(x) = U\ 2 U2 + Mi 2 W 3 + U 2 2 U\ + U 2 2 U3 + U 3 2 U\ + U^U 2 

as an example. To analyze it, our first step is to set W 3 = 0. We obtain a symmetric 
polynomial / 0 = u\ 2 ui + M 2 2 Mi in the remaining variables Wi, M 2 . Let us denote the 
elementary symmetric functions in U\ y u 2 by S\° = Wi + m 2 and ^ 2 ° = U\U 2 ^ We no¬ 
tice that / 0 = 

The second step is to compare/with the polynomial s { s 2 in three variables. We 
compute the polynomial f — s\s 2j where .Si = Mi + + w 3 and S 2 = U 1 U 2 + 

U\U3 + w 2 m 3? finding that 

/ — ^ 1^2 = -3miM 2 w 3 . 

We recognize this polynomial as 〜 3 幻 * So/ = 心幻一 3^. 

The general case is similar. There is nothing to show when n = 1, because 
Mi = Si in that case. Proceeding by induction, we assume the theorem proved for 
n — \ variables. Given a symmetric polynomial / in wi,..., we consider the poly¬ 
nomial f° obtained by substituting zero for the last variable; = 

u n - u 0 ). We note that / 0 is a symmetric polynomial in u n -\. By the 

induction hypothesis, f° may be expressed as a polynomial in the elementary sym¬ 
metric functions in {wi ， … ， w„-i}，which we denote by 

S\° = Ml + + Un-\ ， … ， S n -i° = Wi … U n -\. 

So we can write / 0 = g^Si 0 ，"•，〜_/), Moreover, it follows from the defintion of 
the polynomials Si that 

sp = Si(ll\ ，…， Un—l ， 0 )， if i = 1 ，…， /I _ 1 - 
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Consider the polynomial 

p(Wi ， … ， W„) = f(Uu...,Un) - g(>i ， …，知一 1 )， 

as a polynomial in wi ， ... ， w„. Being a difference of symmetric polynomials, this 
polynomial is symmetric. Also, it has the property that 尸 （《 i ， … ， 1 ， 0) = 0 . 
Therefore every monomial occurring in p is divisible by By symmetry, p is divis¬ 
ible by ui for every z’，and hence it is divisible by s n ^ So 

(3.16) /(Ml， …， = gQi, … ，知 -1) + 〜 /l(Wi ， ". ， Wn )， 

for some symmetric polynomial h. We now work on h(Ui ， … ， u n ). By induction on 
the degree ，h is a polynomial in the symmetric functions, and hence so is f. 

It remains to prove the uniqueness of #(〜，•••，〜)• The uniqueness means that 
there is only one polynomial (p(z\^^,z n ) in the variables such that 
<p(s u … ， s n ) = /(i^ ，…，⑹)， as polynomials in u u ..., u n . In other words, the kernel 
of the substitution map 

<j ： R[z] - 

sending z / Si [ s zero. To show this, suppose (pisij^^Sn) — 0 for some 
<p E R[z]. Setting = 0 in this expression we still get zero: <p(si°, 0 )= 
0 . By induction on n, this implies that <p(zi ， ... ， 以 -i ，0 ) = 0 . Therefore z n divides 
<p(z )，and we may write <p(z) = z n i)/(z). Then 0 = 9(5) = s n ijj(s) = u x -^u n \ff{s). 
Since the product Ui ••• u n is not a zero-divisor in the polynomial ring R[u], 
= 0. The polynomial has lower total degree in z than (p(z), so we may ap¬ 
ply induction on the degree to conclude that 少 = 0 . Hence (p = 0 too. □ 

Now suppose that R = F is a field. Then we may also consider the field of ra¬ 
tional functions in the variables ui , that is，the field of fractions of ， ••• ， u n ]. The 
symmetric group also acts on this field, and the corresponding assertion is true: 

( 3 . 17 ) Theorem. Every symmetric rational function is a rational function in 

5 1 1 , • •. ， iS 1 /} • 

Proof. Let r(u) = f(u)/g(u) be a symmetric rational function, where 
/, g E F[u\ We can build a symmetric function from g by multiplying all the <^8 
together: 

g = n 吓 

<rES n 

is a symmetric polynomial. Then G(u)r (u) is a symmetric rational function, and it is 
also a polynomial in {wi ， … ， w r } — a symmetric polynomial By Theorem ( 3 , 4 ), g(u) 
and G(u)r(u) are polynomials in the elementary symmetric functions {st}. Thus r(u) 
is a rational function in {&}. □ 

The pair of fields 

( 3 . 18 ) F(s) = F(s u ...,s n ) C F(u u ...,u n ) = F(u) 
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is an example of a Galois extension. This follows from Theorem (1.11)，because 
F(u) is a splitting field of the polynomial p(x) (3.3) and because the roots Uu …， Un 
are distinct. By Proposition (1.14), the Galois group G = G(F{u)/F(s)) operates 
faithfully on the roots. On the other hand, G contains the full symmetric group，by 
construction. Therefore G = 心 ， As a corollary，we find that [F(u) : Z 7 ⑻] =«!• 
Needless to say, this can be proved directly. 


4. PRIMITIVE ELEMENTS 

At the end of the first section，we saw that generically chosen elements of a bi¬ 
quadratic extension K/F generate K. It is possible to derive a general statement of 
this type as a corollary of the Main Theorem of Galois theory. But we are going to 
prove it directly instead, and then use this fact in the proof of the Main Theorem. 

(4,1) Theorem. Existence of a primitive element: Let K be a finite extension of a 
field F of characteristic zero. There is an element y G K such that K = F(y). 

An element y which generates a field extension K/F is called a primitive ele¬ 
ment for K over F. So the theorem can be restated by saying that every finite exten¬ 
sion 尺 of a field F has a primitive element. We have restated our general hypothesis 
that F has characteristic zero here because this theorem is not true for fields of char¬ 
acteristic p. 

Proof of Theorem (4J). We use induction on the number of generators of K. 
Say that K = If n = 1， there is nothing to prove. For w > 1, the in¬ 

duction principle allows us to assume the theorem true for the intermediate field 
K\ = F(a\ ， .，” a«-i). So we may assume that K\ is generated by a single element /3, 
Then K — K\(a n ) = We have to show that this field has a primitive ele¬ 

ment. We are thereby reduced to the case that n 二 2, so that K is generated by two 
elements a ， /3. 

Let f(x ), g (x) be the irreducible polynomials for a，/3 over F, and let K f be 
an extension of K in which / and g split completely [Chapter 13 (5.3)]. Call their 
roots a = o ： i， …， and ^ = 0i， …， By Chapter 13 (5.8), the elements at are 
distinct. 

We are going to show that for most choices of c E F, the linear combination 
y = P + ca generates K, Let us denote the field F(y) by L. It suffices to show that 
a S L, because if so, then (5 = y — ca will be in L too，and this will imply that 
L = K. The way we show that a is in Z, is indirect: We determine its irreducible 
polynomial over L. As we know, this is the monic polynomial of least degree in 
L[x] which has a as a root 

To begin with, a is a root of f(x). The trick is to use the polynomial g(x) to 
cook up a second polynomial with the root a ， namely h(x) = g(y - cx). Notice 
that h(x) has coefficients in L and that h(a) = 0* If we show that the greatest com- 
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mon divisor of/and /i in L[jc] is jc — a, then it will follow that — a ， being one of the 
coefficients of x — a, is in L. Now the monic greatest common divisor of / and h is 
the same，whether computed in L[x] or in K f [x] [Chapter 13 (5,4)]- So we may 
make our computation in In that ring, / is a product of the linear factors 

x — ai, and it suffices to show that none of them divides h y that is, that none of the 
elements a,, except for a = a\ itself, is a root of h(x). Having gotten this far, the 
rest is just a matter of computing the roots of h. 

Since the roots of g are the roots of h(x) = g(y — cx) are obtained by 
solving the equations 

y - cx = pj 

for x. Since y = /? + ca, the roots are (y — Pj)/c = (/3 — pj)/c + a. We want 
these roots to be different from at, i ^ L This will be so provided that c does not 
take one of the finitely many values 


(4.2) 


ft - P 

ai — a 


with i, j 丰 1 ， 1 • □ 


(4.3) Example. Consider the field K = Q[/ ? ^2]. This field has degree 6 over Q 
[see Chapter 13 (3.5d)]. In the notation of the previous proof，we have p\ = i, 

p 2 = and ai = ^2, a 2 = where ^ = e 1 师 • Condition 

(4.2) becomes 

奶，…丄 

This condition holds for all c E Q except c = 0. Therefore y = i + c^l gener¬ 
ates K over Q for all rational numbers c ^ 0. Of course, many other combinations 
of the two elements /3,a will generate F(/3 ， a), In this example, the product i^2 
also generates 夂•口 

Theorem (4.1) is important for two reasons. First，explicit computation in an 
extension of the form F(y) is easy if the irreducible equation for y over F is known. 
Second, since finite extensions have the form F(y), we can derive their properties 
from facts about algebraic elements. It is this aspect which is most important for us. 

The power of Theorem (4.1) is shown by applying it to the study of automor¬ 
phisms of fields * Consider a finite group G of automorphisms of the field K, and de¬ 
note its fixed field K G by F, 


(4.4) Proposition* Let G be a finite group of automorphisms of a field K, and let 
F be its fixed field. Let {/3i ， ,.. ， /3 r } be the orbit of an element p = pi E K under 
the action of G. Then /? is algebraic over F, its degree over F is r, and its irreducible 
polynomial over F is g (jc) = (jc — /3i) … (jc — /3 r ). 
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Note that the degree of being the order of an orbit, divides the order of the 
group. 

Proof. Let/(x) be the irreducible polynomial for /3 over F. Since /(jc) is fixed 
by G, each of the elements /3 / is a root of / (L14), and so g divides /. Also, g is 
fixed by all permutations of {/3i, … ， /3 r }，and hence by the operation of G, which 
permutes the orbit. Therefore g(x) E F[x], Since / is irreducible, g = /. □ 

This proposition provides a method for determining the irreducible polynomial 
for an element /3 of a Galois extension K over F. For example, let K be the bi¬ 
quadratic extension Q(/, V2), and let /3 = i + V2. The Galois group of K/Q is the 

Klein four group, and the orbit of (3 consists of the four elements 土/土 V2. So the 
irreducible polynomial for /3 over Q is 

(x - i - Vl)(x - i + Vl)(x + i - Vl)(x + / + V2) 

= (x 2 — 2ix — 3)(x 2 -f 2ix — 3) = x 4 — 2x 2 + 9. 

We can also determine this polynomial by computing powers of /3 and finding the 
linear relation of smallest degree between them (see Chapter 13, Section 3). How¬ 
ever, the method given here is preferable because it always produces an irreducible 
polynomial, 

(4.5) Corollary. Let K/F be a Galois extension, and let ^(jc) be an irreducible 
polynomial in F[x]. If g has one root in K y then it factors into linear factors in K[x]. 

Proof. According to Corollary (L9), F is the fixed field of the Galois group 
G = G{K/F). Let 卢 be a root of g(x) in K. By Proposition (4.4) ? the irreducible 
polynomial for /3 over F is (jc — 的 ） … （jc — ， where {/3i ，…， /3 r } is the G-orbit of 

Since g(x) is the irreducible polynomial for /3, it is equal to this product, so it 
factors into linear factors in K, as asserted. □ 

The corollary tells us in particular that every Galois extension is a splitting 
field, which is part of Theorem (1.11). For, take any generators for K over 

F ， and let/(x) be the product of their irreducible polynomials. Then / splits com¬ 
pletely in and hence ^ is a splitting field for /， 

(4.6) Theorem. Let G be a group of order n of automorphisms of a field K, and 
let F be its fixed field. Then : F] = n. 

Proof • Proposition (4.4) shows that every element j3 of K is algebraic over F 
and that its degree divides n = \ G\ t The theorem of the primitive element implies 
that the degree of the whole field extension K/F is bounded by n too. To see this，we 
form a chain of extension fields as follows: We choose an element a\ G K which is 
not in F, and we set F\ = F(a\). Then [F\ : F] ^ n. If F\ ^ K, we choose an ele¬ 
ment ai ^ K which is not in Fi, and we set F 2 = F(a\,a 2 ). By the theorem of the 
primitive element, F 2 is generated by a single element y, and by Corollary (3.6) of 
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Chapter 13, the degree of y over F is bounded by n. So [F 2 : F] < /i. Continuing in 
this way，we obtain a chain F < F\ < F 2 ... in which [Fi : F] < /z for all i. This 
chain must be finite. So Fi = K for some i, and [^ : F] < n. 

Applying Theorem (4.1) once more，we conclude that K has a primitive ele¬ 
ment: K = F(J3). Any element of G which fixes /3 acts as the identity on 
K = F(p). Since we are assuming that G is a group of automorphisms of K, the 
identity is the only such element. Therefore the stabilizer of is {1} ? and the orbit 
has order n. By Proposition (4.4), /3 has degree n over Z 7 , and [K : F] = n. n 

Using the theorem we have just proved，we can derive the first theorem, Theo¬ 
rem (1.6)，which was stated in Section L That theorem says that for any finite ex¬ 
tension K/F, the order of its Galois group divides its degree. To prove this, we set 
G = G{K/F). Then G operates on K, so by Theorem (4,6) ， \G\ = [K : K G ]. And 
since F C K G C K,[K : K G ] divides [K : F]. □ 

Theorem (4.6) also provides us with a converse to Corollary (1.9): 

(4-7) Corollary • Let G be a finite group of automorphisms of a field K, and let F 
be its fixed field. Then 尺 is a Galois extension of F, and its Galois group is G. 

Proof. By definition of the fixed field, the elements of G are F -automorphisms 
of K. Hence G C G(K/F). Since | G(K/F) \ < [K : F] and [K : F] = \ G\, it fol¬ 
lows that I G{K/F) \ = [K: F] and that G = G(K/F). □ 

We can get some interesting examples to illustrate Proposition (4.4) and Theo^ 
rem (4.6) by considering automorphisms of the field C(y) = K of rational functions 
in y. For instance, let <t，t be the automorphisms of K defined by y/vwv^ —y and 

- 1 . The automorphisms {1 ， (t ， t ， otJ form a group G of order 4. 

(4.8) Proposition. Let K and G be as above. The fixed field F = is the field 
C(w) of rational functions in w = y 2 — y~ 2 . 

In other words, every rational function/(y) which is fixed by a can be expressed as 
a rational function in w. 

Proof. First of all, G does fix w = j 2 — y^ 2 , so w is in the fixed field. There¬ 
fore the fixed field F contains the field C(w), Next, we compute the irreducible poly¬ 
nomial for y over F. The orbit of y is {y, iy~\ -y, so Proposition (4,4) tells us 

that the irreducible equation for y is (x — y)(x — + y)(x + iy ~ l )= 

x 4 - wx 2 - l. This polynomial has coefficients in C(w), soy has degree 4 over that 
field. It follows that [K : C(w)] = 4. On the other hand, C(w) C F C K, and since 
|G| = 4， Theorem (4.6) tells us that [K : F] = 4. Counting degrees shows that 
C(w) = F. u 

A famous theorem called LurotKs theorem asserts that any subfield of the field 
C(y) which properly contains the complex numbers is the field of rational functions 
in some rational function w of y. 
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Let/(jc) be a monic polynomial of degree n with coefficients in a field F, We recall 
that a splitting field of f(x) E F[x] is a field of the form K = F(q ： i ， …， a rt )， such 
that f(x) = (jc — cki) … （jc — a n ) in K[x]. The existence of a splitting field was 
proved in Chapter 13 (5.3). We now want to show that any two splitting fields of a 
given polynomial /(jc) are isomorphic. This follows from the fact that a field exten¬ 
sion of the form F(a) is determined by the irreducible polynomial for a over F, and 
from some “bookkeeping •” The bookkeeping required for the proof is notationally a 
little confusing, but not difficult. ~ 

Any _ isomorphism <p: F - >F of fields extends to an isomorphism 

F [jc] - > F[x] between the polynomial rings by 

a n x n + •" + ao AAAA/ ^ a n x n + ,•• + ^o ， 

where dj = <p(aj). Let us denote the image of /(jc) by f(x) . Since <p is an isomor¬ 
phism, f(x) will be an irreducible polynomial if and only if /(jc) is irreducible. 

The following lemma generalizes Chapter 13 (2.9). 

(5.1) Lemma* With the above notation, \etf(x) be an irreducible polynomial in 
F[x]. Let a be ajoot of/(x) in an extension field K of F, and let a be a root of / (jc) 
in an extension K of F. There is a unique isomorphism 

(Pi: F(a) — >F(d ) 

which restricts to cp on the subfield F, and which sends a to a. 

Proof. We know that F(a) is isomorphic to the quotient F[x]/(f), and simi¬ 
larly F(d) is isomorphic to F[x]/(f). The rings F[x] and F[x] are isomorphic, as 

we just saw, and since / and / correspond under this isomorphism, so do the ideals 
(/) and (/) which they generate. Therefore the residue rings F[x]/(f) and F[x]/(f) 
are also isomorphic. Combining these isomorphisms yields the required isomor¬ 
phism (p\. This extension of (p is unique because a generates F(a) over F. □ 

(5.2) Proposition. Let <p： F - >F be an isomorphism of fields. Let f(x) be a 

nonconstant polynomial in F\x\ and let f(x) be the corresponding polynomial in 
F[x\ Let K and K be splitting fields for /(jc) and /(jc). There is an isomorphism 
中： K - >K which restricts to <p on the subfield F of K. 

If we let F = F and (p = identity, we obtain the following corollary: 

(5.3) Corollary. Any two splitting fields off(x) E F[x] over F are isomorphic. □ 

The corollary is the result we are really after. The auxiliary isomorphism <p is intro¬ 
duced into the proposition to make the induction step of the proof work. 
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Proof of Proposition (5.2), If f(x) factors into linear factors over F, then f{x) 
also factors into linear factors. In this case K = F and K = F，so <p = if/. Assume 
that / does not split completely. Choose an irreducible fector g(x) of f(x) of degree 
>1. The corresponding polynomial g (x) will be an irreducible factor of f(^x). Let a 
be a root of g in 足 and write Fi = F{a). Make a similar choice of a and F x = F(a) 

in K. Then by Lemma (5,1)，we can extend <p to an isomorphism (pi ： F\ - >F\ 

which sends Being a splitting field for/over F, K is also a spUtting field of 

f over the larger field F { , and similarly ^ is a splitting field for / over F\. Therefore 
we may replace F ， F ， <p by Fi,F\,<p\ and proceed by induction on the degree of K 
over F. □ 

We are now in a position to prove the second of the theorems, Theorem 
(1,11 )， which was announced in Section 1. One part of this theorem was proved in 
the last section，using Corollary (4.5). For convenience, we restate the other part 
here. 

Theorem. Let K be the splitting field of a polynomial f(x) E F[x]. Then [ is a 
Galois extension of F; that is ， \ G(K/F) \ = [K : F]. 

We will prove the theorem by going back over the proof of Proposition (5.2), keep¬ 
ing careful track of the number of choices- 

(5*4) Lemma* With the notation of (5.2), the number of isomorphisms 
ip: K - >K extending <p is equal to the degree [K : F\ 

The theorem follows from this lemma if we setF=F,K= K, and <p = identity. □ 

Proof of Lemma (5.4). We proceed as in the proof of Proposition (5.2), choos¬ 
ing an irreducible factor g(x) of f{x) and one of the roots a of ^(jc) in K. Let 

Fi = F(aJ. Any isomorphism \p\ K - >K extending <p will send F\ to some 

subfield Fi of K. This field K will have the form F(a), where d = is a root of 
g(x) in K. 

^ Conversely, to extend <p to ip, we may start 一 by choosing any root a of ^(jc) in 

K. We then extend to a map (p \： Fi - >F { = F(a) by setting <pi(a) = a. We use 

induction on [K : F\ Since [K : F { ] < [ 尤：厂 ], the induction hypothesis tells us that 
for this particular choice of <p u there are [K : Fi] extensions of <pi to an isomor- 

phism iff ： K - >K. On the other hand, g has distinct roots in K because g and g are 

irreducible [Chapter 13 (5.8)]. So the number of choices for a is the degree of g, 
which is [Fi : F]. There are [Fi : F] choices for the isomorphism 列 • This gives us a 

total of [K : F } ][F\ : F] = : F] extensions of (p to if/: K - >K. a 

Since any two splitting fields K of a polynomial f(x) E F[x] are isomorphic, 
the Galois group G(K/F) depends, up to isomorphism，only on/. It is often referred 
to as the Galois group of the polynomial over F. 
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The following corollary collects together several criteria for an extension to be 
Galois. Most of them have already been proved, and we leave the remaining proofs 
as exercises. 


(5.5) Corollary• Let K/F be a finite field extension. The following are equivalent: 

(i) 尤 is a Galois extension of F; 

(ii) K is the splitting field of an irreducible polynomial/( jc) G F[x]; 

(ii f ) K is the splitting field of a polynomial /(jc) G F[x]; 

(iii) F is the fixed field for the action of the Galois group G(K/F) on K\ 

(iii') F is the fixed field for an action of a finite group of automorphisms of K. □ 

We now have enough information to prove the Main Theorem of Galois the¬ 
ory, which relates intermediate fields to subgroups of the Galois group. 

Proof of Theorem (L15). Let K/F be a Galois extension. We have to show 
that the maps 

L^^G{K/L) and H^^K h 

are inverse functions between the set of intermediate fields and the set of subgroups 
of G = G(K/F). To do so, we verify that the composition of these two maps in ei¬ 
ther direction is the identity. 

Let L be an intermediate field. The corresponding subgroup of G is 
H = G(K/L). By definition, H acts trivially on L, so L C K H , On the other hand, 
[is a Galois extension of L by (1,13); hence [K : L] = \ H\ . By Theorem (4.6 1 . 
\H\ = [K : K H \ so L = K H • 

In the other direction, suppose that we start with a subgroup H C G, and 
let L = K h . Then H C G(K/L). But \H\ = [K: K h ] = [K : L] = \G{K/L)\. 
Therefore H — G(K/L). This shows that the two maps are inverses，as required 
Since [ is a Galois extension of L = K H ， [K : L] — \H\, and [L : F] = [G : H\ z 

The correspondence given by the Main Theorem has some surrounding details 
which we will now discuss. First of all, the correspondence between fields and sub¬ 
groups is order reversing， that is, if L，L f are two intermediate fields and if 
H = G{K/L), H r = G(K/L r ) are the corresponding subgroups, then L C L f if 
and only if H D H\ This is clear from the definitions of the maps and is consistent 
with the relations (1,16). 

To complete the picture，we will show that the immediate fields L which are 
Galois extensions of F correspond to the normal subgroups of G. Let L be an inter¬ 
mediate field. An F-automorphism cr of will carry L to some intermediate field aL 
which may or may not be the same as L. We call aL a conjugate subfield. 


(5.6) Theorem. Let K/F be a Galois extension, and let L be an intermediate field 
Let H = G(K/L) be the corresponding subgroup of G = G(K/F). 
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(a) Let a be an element of G. The subgroup of G which corresponds to the conju¬ 
gate subfield crL is the conjugate subgroup ctH(t~ 1 < In other words, 
G{K/crL) = <jHcr ~ 1 - 

(b) L is a Galois extension of F if and only if // is a normal subgroup of G. When 
this is so, then G(L/F) is isomorphic to the quotient group G/H: 


(5.7) Diagram. 


G = G(K/F) 

operates on K 
fixing F 



H = G(K/L) 
operates on K ， 
fixing L 

If H is normal ， 
then G/H = G (L/F) 
operates here 


(5.8) Example. In the case of the cubic equation (2.1) whose splitting field has de¬ 
gree 6, the only intermediate extension which is Galois，other than F and K, is F(6), 
which corresponds to the alternating group H = A 3 C S3 - The Galois group 
G(F(8)/F) is cyclic of order 2， as is the quotient group S 3 /A 3 . The three fields 
F(ai) are conjugate. This agrees with the fact that the three subgroups of S 3 of order 
2 are conjugate. 


Proof of Theorem (5.7). (a) Let crL = L\ If r is an element of 
H = G(K/L), then crrcr" 1 is in = G(K/L f ). To check this, we must show that 
cttct 一 1 fixes any element a f E L f . By definition of crL ， a f = a(a) for some 
a E ： L. Then crT(j~ l (a ') = ctt (a) = a (a) = a\ as required. It follows that 
H f D aHa 1 and by symmetry, or by counting elements，that H f = aHa 1 . The 
fact which we have just checked is actually a general property of group actions on 
sets [Chapter 5 (6,4)]. 

(b) Now suppose that H is normal. Then H = crHa~ l for all cr E G; hence 
G(K/L) = G(K/aL). This implies that L — crL for all cr [see (1.9)]. Thus every F- 
automorphism of K carries L to itself and hence defines an F-automorphism of L by 
restriction. This restriction defines a homomorphism 

(5.9) 77 ： G ― ^G(L/F). 

Its kernel is the set of cr E G which induces the identity on which is H. There¬ 
fore G/H is isomorphic to a subgroup of G(L/F). Counting degrees and orders, we 
find 

[L:F] = \G/H\^\G(L/F)\. 

It follows that L is a Galois extension and that G/H — G(L/F). 

Conversely，suppose that L/F is Galois. Then L is a splitting field of some 
polynomial g(x) E F[x]^ that is ? L = F(/3i ， … ， )8*)，where )8 ； are the roots of g(x) 
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in K. An F-automorphism a of K permutes these roots and therefore carries L to it¬ 
self: L = crL. By (a), H = aHa 1 ; thus // is a normal subgroup. □ 

6. QVARTIC EQUATIONS 

Let K/F be a Galois extension. We have seen that if p is an element of K whose 
monic irreducible polynomial over F is g(x), then g splits completely in K, and the 
G-orbit of /3 is the set of roots of g (4.4). So G operates transitively on the roots of 
an irreducible polynomial g E F[x], provided that this polynomial has at least one 
root in K. Combining this observation with Proposition (1.14), we find: 

(6,1) Proposition. Let K/F be a splitting field of a polynomial/(x) E F[x], The 
Galois group G of K/F operates faithfully on the set {ai ， … ， a n } of roots of/. Hence 
this operation represents G as a subgroup of the symmetric group S n * The roots form 
a single orbit if and only if /is irreducible over F. □ 

When the Galois extension K is exhibited as the splitting field of a polynomial of de¬ 
gree n, it is customary to view the Galois group G as a subgroup of the symmetric 
group S n . If the polynomial / is irreducible，then it is a transitive subgroup, which 
means that it acts transitively on the indices {1 ， …， n}. However, the same Galois ex¬ 
tension K/F can be exhibited as a splitting field of many polynomials, so this repre¬ 
sentation of G as a subgroup of S n is not unique. 

For instance，let K/F be the splitting field of an irreducible cubic equation such 
that [K : F] = 6. Then the Galois group is represented as the whole symmetric 
group ^ 3 . However，the theorem of the primitive element tells us that K can also be 
generated by a single element y. Since [K : F] - 6, y has degree 6 over F. This 
means that its orbit has order 6 and that its irreducible polynomial has degree 6. So 
if we think of K as the splitting field of this sextic polynomial, the Galois group is 
represented as a subgroup of - This isn’t a very economical way to represent ^ 3 . 

Let us suppose that our Galois extension K is the splitting field of a polynomial 
f(x) and that its roots in ^ are ， … ， Then, viewing G as a subgroup of S n , we 
may pose the following two problems: 

(6*2) (i) Given a subgroup M of S n ， decide if G C 說， 

(ii) Determine G, 

If we could solve (i) for every subgroup M, then (ii) would also be solved. 

Lagrange’s approach to these problems is to look for functions of the roots 
which are partially symmetric. A partially symmetric polynomial is a polynomial 
… ， w rt ) in the variables {wi ， … ，如 } which is left fixed by the permutations in a 
given subgroup W of S n but not by any other permutations. For example, we saw in 
(2.13) that 


(«1 - u 2 )(ui - U3){u 2 - Us) 
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is a partially symmetric function for the alternating group，when n = 3, There is no 
difficulty in generalizing this construction to arbitrary n by defining 

(6-3) 8(u) = (Ml - U 2 )(Ui - M 3 ) {Un-I - Un) = Y\(ui - My), 

i<) 

This element is a square root of the discriminant (3.6). The effect of a permutation 
of the indices is to multiply 8 by the sign of the permutation. Having this partially 
symmetric function in hand，we substitute the roots 山 ，…， of our polynomial into 
it, to obtain an element 8 (a) = 8 of K which is fixed by even permutations of the 
roots. We can decide whether or not S is in F by determining whether or not the dis¬ 
criminant D is a square. This will provide information about the Galois group, 

(6.4) Proposition• Let K/F be a Galois extension which is the splitting field of an 
irreducible polynomial/^) E F[x] of degree n. Let ai, …， be the roots off(x) in 
K ，and let 8 = 8{a). Then 5 关 0_ Moreover: 

(a) 8 E F if and only if the Galois group G is a subgroup of the alternating group 

A/j • 

(b) In any case，the subgroup G(K/F(8)) of G is contained in the alternating 
group. 

Proof. The case 8 = 0 occurs only if two of the roots are equal, and this can 
not happen if/is irreducible [Chapter 13 (5*8)]* Next, assume that 8 is in F. Since 
odd permutations send 8^^ -8 and since 5 ^ 0, odd permutations don’t fix S, On 
the other hand, the elements of F are fixed by every automorphism in G. It follows 
that G does not contain any odd permutations, hence that G C Conversely, if 
8 E F y we use the fact that K G = F. There must be an element of G which doesn’t 
fix 6. This element will be an odd permutation, so G <L A n > This proves (a). Part (b) 
follows from (a) when we replace F by F(8). □ 

We will now discuss quartic equations, beginning with an interesting special 
case which is controlled by the discriminant. We consider a complex number which 

is presented as a nested square root，say a = vr+s^/t, where r ， s，t are in a field 
F. The numbers 

(6.5) V 3 + 2 V 2 , V 5 +V 2 T, V7+2V?, V5+2V5 

are a few samples. We ask the following question: Is there an expression for a in 
terms of two square roots which are not nested? 

Since a 2 = r + it is easy to write down a quartic polynomial which has 
a as a root, namely 

(6.6) f{x) = (jc 2 — (r + 5 Vi))(^ 2 〜 （r 一 5Vi)) — x 4 + bx 2 + c, 

where b = 一 2rand c — r 1 — s 2 t. If a f denotes one of the two square roots of 
r — sVr, then the roots of this quartic are 

(6.7) a, a \ -a, -a \ 
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The splitting field K = F(a,a f ) of / can be reached by the sequence V^ ， a ， a' of 
three square root adjunctions，so the degree [K : F] divides 8. The degree will be 
less than 8 if one of the square root adjunctions is unnecessary* 

We must decide whether or not / is irreducible. To do so, we first check the ir- 
reducibility of the quadratic polynomial q(y) = y 2 + by -\- c whose roots are 
a 2 , a /2 . If q is irreducible, then / doesn't have a root in F. In that case/, if re¬ 
ducible, will be the product of two quadratic polynomials. Computing with undeter¬ 
mined coefficients, we find that the product must have the form 

(6.8) jc 4 + bx 2 + c = (x 2 + mx + v)(x 2 — ux + v). 

We will be able to determine whether or not such a factorization exists, at least when 
F = Q. 

If/(jc) is reducible, then a is a root of a quadratic polynomial，so it can be 

written using only one square root. This happens with \/3 + 2V2 for example, 

which is equal to 1 + V2 ? as you will check by squaring both expressions. The 
quartics derived from the other examples (6,5) are irreducible over Q. 

We now return to our question. Let’s suppose that / is irreducible. Notice that 

to write a in terms of unnested square roots amounts to finding a bi¬ 

quadratic extension K = F(V^) of F which contains a. Suppose that a bi¬ 
quadratic extension K which contains a can be found. Then ^ is a Galois extension 
of F, so f{x) factors into linear factors in K, This means that K contains a splitting 
field off. In fact, K will be the splitting field, because / is irreducible and of degree 
4. So the Galois group G of /will be the Klein four group. If G is not the Klein four 
group, then a can not be written in terms of unnested square roots. 

Conversely, if K/F is a Galois extension whose Galois group is the Klein four 
group，then K contains three intermediate fields of degree 2 over F. Any two of 
these fields taken together generate K. So 夂 is a biquadratic extension of F, and any 
element of K can be written in terms of two unnested square roots. 

We compute the discriminant of/(x)，using the list (6.7) of roots. 

D = — otj ) 2 = (4aa f ) 2 (a — a f ) 4 (a + a T ) 4 = 2 4 (b 2 — 4c) 2 c 

= 2 s s 4 t 2 (r 2 — s 2 t). 

If D is a square in F, then G is a transitive subgroup of the alternating group A 4 
whose order divides 8, The Klein four group is the only such group. It consists of the 
even permutations of order 2: 

(6.9) V = {(1), (12)(34)，（13)(24)，（14)(23)}. 

There is no other transitive operation of V on {1 ， 2,3, 4}. So we find: 

(6.10) Proposition, Let a = v r+5 with r,s,t E F, and assume that 
f(x) = x 4 ~ 2rx 2 + (r 2 — s 2 t) is irreducible over F. Then a can be written in 
terms of two unnested square roots if and only if r 2 — s 2 t is a square in F. □ 
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If a = V5 + V21, then r 2 — s 2 t = 25 — 21 = 4, which is a square. In the 
last two examples (6.5), r 2 — s 2 t is not a square in Q. _ 

Let us determine the unnested expression for a = V5+ V21 explicitly. 
Galois theory provides the clue; namely it suggests determining the intermediate 
fields. They are quadratic extensions of Q，so they are generated by square roots. 
These square roots are the ones we need to express a. One intermediate quadratic 
extension is obvious，namely Q(V2l). But this isn’t the one we need* To find an¬ 
other intermediate extension, we determine the fixed field of the subgroup H of or¬ 
der 2 which is generated by cr = (12)(34) - If the roots of/are listed in the order 

(6.7) ? the //-orbit of a is {a ， a'}，(where a f — \/5 — V21, and the irreducible 
polynomial for a over K H is (jc — a)(x — a f ) = x 2 — (a + a 0-^ + ^ ； - So has 
degree 2 over the field L = F(a + a f ,aa r ), and this field is contained in K H . A 
consideration of degrees shows that L — K H • With this clue, we compute，finding 
aa f — 2, {a + a f ) 2 = 14 ? and a -\- a f = vT4. Similarly ，a — a r — V 6 . We 
solve for a ， obtaining a — 5 (V 6 + Vl4)* □ 


It is harder to analyze a general quartic equation, and the roots can usually not 
be written explicitly in a useful way. However，there is another partially symmetric 
function which helps to determine the Galois group. Let /(jc) be an irreducible quar¬ 
tic polynomial with roots {cki ， a 2 ， a 3 ， a 4 } in a splitting field K. Then by Proposition 
(6.1)，its Galois group is a subgroup of 5 4 , and the roots form one orbit. The transi¬ 
tive subgroups of S 4 are 

(6.11) 1 S 4 , A 4 , D 4 , C 4 , V f 

where V is the group (6.9). Actually，there are three conjugate subgroups isomor¬ 
phic to D 4 and three conjugate subgroups isomorphic to C 4 . The other subgroups are 
uniquely determined. There are some other subgroups of *S 4 which are isomorphic to 
the Klein four group，but they are not transitive. 

Let us ask for partially symmetric functions of the roots to distinguish these 
groups* As we have seen, the element 8 determines whether or not G C A 4 . The 
subgroups of A 4 in our list are A 4 and V, So 8 E F if and only if G is one of these 
two groups. 

Next，we consider the partially symmetric polynomial 

(6.12) /3i(w) = U 1 U 3 + «2«4. 

A permutation of the indices carries P\(u) to one of the three polynomials 
i = 1 ， 2,3, where 

p 2 (u) = M 1 W 2 + u 3 u 4 and ^(m) = Mim 4 + «2«3- 

Since *S 4 has order 24, the stabilizer of ) 81 (w) is of order 8; it is one of the three dihe¬ 
dral groups Z) 4 . The polynomial (jc — /3i(w))(jc — j8 2 (w))(jc — ^(u)) is left fixed by 
all permutations of the variables Ui 5 so its coefficients are symmetric functions* They 
can be computed explicitly in terms of the elementary symmetric functions. 
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Going back to our quartic polynomial，we substitute the roots a/ into j 8 /(w)，to 
obtain three elements Pj(a) = E K. They form one orbit under the action of the 
symmetric group on the roots. If they are distinct elements of K, then the stabilizer 
of /3i in S 4 will have order 8, so it will be the dihedral group D 4 . We are lucky: The 
Pj are distinct. For example ， 

- Pi = ot\a 3 + a 2 a 4 - a x a 2 - a 3 a 4 = («i - a 4 )(a 3 - ot 2 )^ 

Since we have assumed that/is irreducible, its roots are distinct. The right side of 
this equation shows that j 8 i — /3 2 ^ 0. 

Since the Galois group G permutes the elements /3/, the polynomial 
g(x) = (x—P))(x—^i)(x—^ 3 ) has coefficients in F. It is called the resolvent cubic of 
the quartic polynomial /(jc). 

Though the symmetric group acts transitively on {/3i ,^ 2 ,) 83 }, the Galois group 
G ，which is a subgroup of S 4 , may not act transitively. Whether or not it does pro¬ 
vides information about G. If G fixes for example, then G is contained in the sta¬ 
bilizer D 4 of Pu In this case ) 81 will be in the field F (1.9), so the resolvent cubic 
will have a root in F. Proceeding as in the proof of Proposition (6,4), we find the 
following: 

(6.13) Proposition. Let g (x) be the resolvent cubic of an irreducible quartic poly¬ 
nomial/^), and let A" be a splitting field of/. Then g(x) has a root in 尸 if and only 
if the Galois group G = G(K/F) is a subgroup of one of the dihedral groups D 4t In 
any case，if /3 is a root of g(x) in K ，then the Galois group G(K/F(j3)) is a subgroup 
of a dihedral group D 4 . □ 

Thus the polynomials x 2 — D, where D is the discriminant，and the resolvent 
cubic ^(jc) nearly suffice to describe the Galois group. The results are summed up in 
this table: 

(6.14) Table. 

d a square in F d not a square 

g reducible 
g irreducible 

Explicit computation for arbitrary quartic equations becomes unpleasant, but 
we can easily calculate the discriminant of a quartic which has the form 

(6.15) x A rx + s. 

The discriminant is a symmetric polynomial of degree 12 and therefore has weighted 
degree 12 in the elementary symmetric functions 心 ，…，“， Substituting (0,0, -r,s) 
for (s \, s 2 ,S 3 ,s 4 ) into the unknown formula for the discriminant will kill any mono¬ 
mial involving 51 or 夕 2 . And the only monomials of weighted degree 12 which do not 
involve 51 and are s/ and s 4 3 . Thus the discriminant of (6.15) has the form 
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D = A(0,0, -r, s) = cr 4 + c f s 3 . 

We can determine the coefficients c,c f by computing the discriminant of two partic¬ 
ular polynomials- The answer is 

(6.16) d = -27r 4 + 256^ 3 . 

For example, the discriminant of 

(6.17) f(x) = x 4 + Sx + 12 

is 3 4 • 2 12 - This is a square in Q. The Galois group of the splitting field of (6.17) 
over Q is therefore a subgroup of A 4 . 

To calculate the resolvent cubic g{x) of the polynomial (6* 15)，we write the re¬ 
solvent cubic for the general polynomial whose roots are w】”••，as 

g(x) — x 3 — b\x 2 + bix — 

then since pi is a quadratic function in {uj}, bi has degree 2i in {uj} and weighted de¬ 
gree 2i in the symmetric functions - Proceeding as above, one finds 

(6.18) g(jc) — ~ — r 2 . 

The resolvent cubic of the particular quartic polynomial (6,17) is ;c 3 — 48jc - 64. 
The quartic (6.17) and its resolvent cubic are both irreducible over Q. It follows that 
G = A 4 for the polynomial (6,17). 


7. KUMMER EXTENSIONS 

Let us now consider the splitting field over a field F of a polynomial of the form 

(7.1) f(x) = x p - a, 

where is a prime. We will assume that the base field F is a subfield of C which 
contains the primitive pth root of unity l P = e 27ri ^ p . The complex roots of f(x) are 
the /?th roots of a, and if a denotes a particular pth root, then the roots of/(;c) are 

(7.2) a, ia, 〜， … ， 4 ， 

where I = Therefore the splitting field is generated by a single root: K = F{a). 

(7.3) Proposition. Let F be a subfield of C which contains the pth root of unity 
l Py and let a be an element of F which is not a pth power in F. Then the splitting 
field of/(x) = x p — a has degree p over F, and its Galois group is a cyclic group of 
order p. 

Proof• Let K be a splitting field of/，and let a be one of its roots in K. As¬ 
sume that a is not in F. Then there is an automorphism a of K/F which does not fix 
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a. Since the roots of / are ( l a, i = O，"。/? — l ， ,cr(a) = ( v a for some v 0. We 
now compute the powers of cr. Remembering that a is an automorphism and that 
cr(f) = f because f E we find cr 2 (a) = a(^ v a) = f^cr(a) = ( 2v a. Similarly, 
a l (a) = ^ iv a for each i. Since f is a pth root of unity，the smallest positive power of 
cr which fixes a is a p . Hence the order of a in the Galois group is at least p. On the 
other hand, a generates K over F ， and a is a root of the polynomial x p — a of de¬ 
gree p, so [K : F] < p. This shows at the same time that [K : F] = p, that x p — a 
is irreducible over F, and that G(K/F) is cyclic of order p. □ 

Here is a striking converse to Proposition (7.3): 

(7.4) Theorem. Let F be a subfield of C which contains the pth root of unity f, 
and let K/F be Si Galois extension of degree p. Then K is obtained by adjoining a 
pth root to F. 

Extensions of this type are often called Kummer extensions. For p - 2, the theorem 
reduces to a femiliar assertion: Every extension of degree 2 can be obtained by ad¬ 
joining a square root. But suppose that p - 3 and that F contains . If the discrimi¬ 
nant of the irreducible cubic polynomial (2,3) is a square in F, then the splitting field 
of / has degree 3 [see (2.16)] ? so its Galois group is a cyclic group. Therefore the 
splitting field of such a polynomial has the form F("¥a), for some a B F. This isn’t 
obvious. 

Proof of Theorem (7.4). The Galois group G has priitie order p = [K : F\ so 
it is a cyclic group. Any element cr, not the identity, will generate it. Let us view K 
as an F-vector space. Then cr is a linear operator on K. For，since a is an F-automor- 
phism, 

cr (a + j8) = cr(a) + cr ⑹ and cr(ca) = cr(c)a(a) = ca (a), 

for all c E F and a ? /3 E ： K. Since G is a cyclic group of order p, a p = 1, An ei¬ 
genvalue A for this operator must satisfy the relation 二 1 ， which means that A is a 
power of By hypothesis, these eigenvalues are in the field F. Moreover，there is at 
least one eigenvalue different from 1 • This is a fact about any linear operator T such 
that some power of T is the identity, because such a linear operator can be diagonal¬ 
ized [Chapter 9 (2.3)]. Its eigenvalues are the entries of the diagonal matrix A which 
represents it. If r is not the identity, as is the case here, then A 妾 /， so some diago¬ 
nal entry is different from 1. 

We choose an eigenvector a with an eigenvalue ^ ^ 1. Then a (a) = fa ， 
and hence cr(a p ) = a(ay = = C p ol p = a p . So cr fixes a p . Since cr gener¬ 

ates G, the element a p is in the fixed field K G , which is F (1.9). We have therefore 
found an element a E ： K whose pth power is in F. Since cr(a) ^ a, the element a 
is not in F itself- Since [K : F] is prime, a generates K. □ 

(7-5) Example. Consider the cyclic cubic polynomial (2.12) x 3 - 3x + 1. Let 
{ 作， t? 2 , 173 } denote its roots. There is an element cr E G(K/F) acting as a cyclic 
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permutation. We choose the basis (1 ? t?i 3 172 ) for K over F = (Why is it a ba¬ 

sis?) With respect to this basis, the matrix of the linear operator cr is 

"1 0 0" 

a = 0 0—1 ， 

_0 1 - 1 _ 

because cr(l) = 1 ， 0-(171) = V2, 0-(172) = t? 3 = _i?i — i?2 - The vector ( 0 , 1 ， -^) x 
is an eigenvector with eigenvalue (3. Thus if a = 171 - ^ 3 t? 2, then a 3 is an element 
of F y and a generates the splitting field of x 3 — 3 jc + 1 over 厂 We can compute a 3 
explicitly，using the feet that 171 = (9 + (/ and r } 2 = C9 1 + A 7 . Noting that 
(3 = (9 3 , we find a = 心 8 - “ 5 and a 3 = 3(1 — ☆)• □ 

(7.6) Example. Let/(x) be an arbitrary irreducible cubic polynomial over a field F, 
and let ^ be a splitting field off(x)(x 3 — 1) over F. Let L G Kbe the intermediate 

field generated by ^ and 8 = Vd, where D is the discriminant of/ Then [L : F] di¬ 
vides 4, and [K: L] = 3 ? by (2.16). The four elements {1 ， V(~3d)} 
span L as F-vector space in any case. By Theorem (7.4), K - L{"Vb), for some 
b E ： L. Therefore the roots off(x) admit some expression in terms of a cube root of 
the form 

^S/c\ + c^^/d + c^\/—3 + C4\/~3D, with ci E ： F.n 


8. CYCLOTOMIC EXTENSIONS 

The subfield K of the complex numbers which is generated over Q by ^ = e 2m ^ n is 
called a cyclotomic field. Also，for any subfield F of C, the field F(^ n ) is called a 
cyclotomic extension of F. It is the splitting field over F of the polynomial 

(8.1) x” - 1. 

If we denote ^ by f ? the roots of this polynomial are the powers of the nth roots 
of unity We will concentrate on the case that n is a prime integer p 

different from 2 in this section- 

The polynomial x p ~ l + --• + x + 1 is irreducible over Q, and f ^ is one 
of its roots [Chapter 11 (4.6)]. So it is the irreducible polynomial for f over O. Its 
roots are the powers f， …， Hence the Galois group of Q(f) over Q has or¬ 
der p _ 1. 

(8.2) Proposition. Let /? be a prime integer，and let ( = fp. 

(a) The Galois group of Q(() over Q is isomorphic to the multiplicative group ¥ p x 
of nonzero elements of the prime field F p . It is a cyclic group of order /? — 1, 

(b) For any subfield F of C, the Galois group of F(() over F is a cyclic group. 
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Proof. Let G be the Galois group of F(() over F, We define a map 

v: G - > ¥ p x as follows: Let a G G be an automorphism. It will carry f to another 

root of the polynomial + … + jc + 1 ， say to The exponent i is determined as 
an integer modulo p, because ( has multiplicative order p. We set v(cr) = i. Let us 
verify that v is multiplicative: If t is another element of G such that v(t) = j, that 
is ， r(f) = then 

(8.3) = a(C j ) = a(£) j = C j . 

Also, the identity automorphism sends f to f ? and hence v(\) = L Since v is com¬ 
patible with multiplication and v(a) 关 0 , u is a homomorphism to (F/. The homo¬ 
morphism is injective because, since ( generates K, the action of an automorphism is 
determined when we know its action on f, Thus G is isomorphic to its image in f p x . 
Since ¥ p x is a cyclic group, so is every subgroup• Therefore G is cyclic• If F = Q 9 
then |G| = I F p x [ = /? — 1, so these two groups are isomorphic. □ 

Suppose that F = Q • Then being cyclic and of order p — 1 , the Galois group 
G of K = Q(Cp) has exactly one subgroup of order k for each integer k which divides 
p — L If (/? — \)/k = r and if cr is a generator for <7, then the subgroup of order k 
is generated by cr r . So by the Main Theorem of Galois theory, there will be exactly 
one intermediate field L with [L: Q] = r. These fields are generated by certain 
sums of powers of f = ^. We will illustrate this by some simple examples. 

The simplest case is /? = 5, Then [&:<□] = 4， and there is an intermediate 
field of degree 2 over Q. It is generated by 17 = f f 4 = 2 cos 2tt/5. Since 
2 cos 2 丌 /5 = |(—1 + V5), the intermediate field is the quadratic number field 

Q(V5). 

(8.4) Proposition* The subfield L of K = Q(^ P ) whose degree over Qis{(p — 1) 

is generated over Q by the element t] = ^ = 2 cos 2tt //?. Moreover, 

L = K nu. 

Since L = K H [R, L is also called the real subfield of K. 

Proof. Notice that f is a root of the quadratic equation x 2 — tjx + 1, which 
has coefficients in Therefore [K: < 2. On the other hand, 17 is a real 

number，while ( is not real, so Q(r}) < K. It follows that [K : *□(!?)]= 2， that 
Q (”）= K H U, and that [Q(r]) : Q] = \(p — 1). □ 

When p = l y i? = ( + ( 6 has degree 3 over Q. Its irreducible polynomial 

over Q can be computed by a method which we have used before (2.12). We guess 
that the other roots are 7/2 = f 2 + f 5 and 173 = f 3 + These are the other sums 
of a pth root and its inverse. It is not hard to show that { 171 ， 172 , 173 } is the G-orbit of 

r] = 7} u so this guess can be justified formally. We expand 

(x — t?i) 0 c — 172 )(x — ?] 3 ) and use the relation ( 6 + … + f + 1 = 0 ， obtaining 
the irreducible equation x 3 + x 2 — 2x — 1 for r) over Q. 
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The cyclotomic field Q(Ci) also contains a quadratic extension of Q, It is gen¬ 
erated by e = f + ( 4 . If we set e' = + ( 6 , then (x — e)(x — e’）= 

x 2 + x + 2 is its irreducible equation. The discriminant of this polynomial is —7, so 
<Q(e) = It follows that Q(^ 7 ) contains V- 7 , 

Suppose that p = 17. Then [Q(0 : Q] = 16. A cyclic group of order 16 con¬ 
tains a chain of subgroups Cja D Cg D C 4 3 C 2 D Cj, By the Main Theorem of 
Galois theory, there is a corresponding chain of intermediate fields Q <Z F\ C 
F 2 C F 3 C Q(^) ? of degrees 1 ， 2,4, 8, 16 over Q，The field F 3 of degree 8 is the real 
subfield generated by 17 = 2 cos 2 开 /17, as in Proposition (8.4). Since each exten¬ 
sion in this chain has degree 2 ,巧 can be reached by a succession of three square 
root adjunctions. This proves that 2 cos 2 丌 /17， and hence the regular 17-gon，can 
be constructed by ruler and compass [Chapter 13 (4.9)], 

The other field extension which we will describe for all primes is the one of de¬ 
gree 2 over Q. The Main Theorem of Galois theory tells us that there is a unique 
intermediate field L of O of degree 2, corresponding to the subgroup // of G of or¬ 
der \{p — 1). If a generates G, then H is generated by a 2 . 


(8.5) Theorem. Let p be an odd prime, and let L be the unique quadratic exten¬ 
sion of Q contained in the cyclotomic field <□(&). Then 

L = Q(V±7), 

where the sign is (-l) 1/2(p — … 

Proof • We need to select a generator of L whose equation is easy to determine. 
Gauss’s method is to take the sum of half of the powers of (，suitably chosen. 

There is another choice of generator for L which is a little simpler to work 
with. Let D be the discriminant of the polynomial 

(8.6) x p - 1. 

This discriminant can be computed directly in terms of the roots {1 ， f ， （ 2 ,…， 
but it is easier to determine D using the following nice formula: 

(8.7) Lemma. Let f(x) = (x - ai ) … （jc 一 a«). The discriminant of/is 

D = 土 /'(%) …/ ， ( 知） = ±l\f , (a i ), 

i 

where / r is the derivative. 

Proof 9 By the product rule for differentiation, 

n 

f f (x) = 2 (x - ai) … （x — a 卜 i)(x — a;+i) ••• (x — a n ). 

/ =1 

Therefore 

f f (ai) = (a/ — ai) (ai — a 卜 i)(a/ — a/+i) … （印 _ a«). 

This is the product of the differences (a, — aj), with the given i and with j 手 i. 
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Thus 


I\f f (oii) = El ( a i - a j) = ±D - Q 

i i 丰 j 

We apply this lemma to our polynomial x p — l. Its derivative is px p ~\ so the 
discriminant is 


D 二 ± Y\p( i(p ^ = P p ， 

i 

where the exponent N is some integer. To determine we note that D is a rational 
number，because the coefficients of x p — 1 are rational. The only power of ( which 
is rational is 1. Therefore C N = l and 

(8.8) D - ±p p . 

The square root of this discriminant is 8 = V±/? p . It is in the field Q((). 
Since p is odd and since square factors can be pulled out of a square root, 

(8.9) Q(5) = Q(V±p). 

Therefore this field is a quadratic subfield of <□(()，and since L is the only quadratic 
subfield，it is L. We leave the determination of the sign as an exercise, □ 

The following theorem, first stated by Kronecker，is one of the most beautiful 
theorems of algebraic number theory. Unfortunately, it would take too long to prove 
it here. 


(8.10) Theorem. Every Galois extension of O whose Galois group is abelian is 
contained in one of the cyclotomic fields <□(&)•□ 


9. QIUNTICEQUATIONS 

The main motivation behind Galois’ work was the problem of solving fifth-degree 
equations. We are going to study his solution in this section. A short time earlier, 
Abel had shown that the quin tic 

(9,1) jc 5 + a 4 x 4 4 - « 3 x 3 + aix 2 + fli + ao 

with variable coefficients a ； could not be solved in terms of radicals, but it remained 
to find an explicit polynomial with rational coefficients which couldn’t be solved- 
Anyhow，because the problem was over 200 years old，interest in it continued. In 
the meantime ， Galois’ ideas have turned out to be much more important than the 
question which motivated them. 

An expression in terms of radicals may become very complicated, and I don’t 
know a good notation for a general one* However, it is easy to give a precise recur¬ 
sive definition. Let F be an arbitrary subfield of the complex numbers- We say that a 
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complex number a is expressible by radicals over F if there is a tower of subfields 
F = Fo C F\ G ... C F r of C such that 


(9.2) 

(i) a E F r ，and 

(ii) for every j = 1 ， … ， r, 6 is generated over Fj-\ by a radical In other words, 
Fj = Fj-i(Pj), and for some integer nj, PfJ G Fj-i. 

This definition is formally similar to the description [Chapter 13 (4.9)] of the real 
numbers which can be constructed by ruler and compass. In that description, only 
square roots of positive real numbers are allowed. 


(9.3) Proposition, Let a be a root of a polynomial/(x) E F[x] of degree < 4. 
Then a is expressible by radicals over F. 

Proof. For quadratic polynomials, this is the quadratic formula. For cubics, 
Cardano’s formula gives the solution. Suppose that/(x) is quartic. If/is reducible ， 
then a is a root of a polynomial of lower degree，and the problem is solved. If not ， 
then / has distinct roots in a splitting field K, so its discriminant D is not zero. Let 
g (x) be the resolvent cubic off. We proceed by adjoining the square root 8 of D, 
obtaining a field 巧 (possibly equal to F). Next，we use Cardano’s formula to solve 
the resolvent cubic. This will require a square root extension F 2 followed by a cube 
root extension F 3 . At this point, Table (6.14) shows that the Galois group of K/F 3 is 
a subgroup of the Klein four group* Therefore K can be reached by a sequence of at 
most two more square root extensions F 3 G F 4 C F 5 = K. a 


The nth roots of unity Cn = e 27Tl ^ n are allowable in an expression by radicals. 

Also, if n = rs, then = vV5. So at the cost of adding more steps to the chain 
of fields, we may assume that all the roots are pth roots，for various prime inte¬ 
gers p. 


Note that there is a great deal of ambiguity in an ex 


by radicals, be 


_ _ pres sion 

cause there are n choices for each The notation (—3 + V2) 1/4 may stand for 
any one of 20 complex numbers, so the tower of fields Q C C 

Q((-3 + V2) 1/4 ) is not uniquely defined. This ambiguity is inherent in the nota¬ 
tion. Since the notation is cumbersome anyhow, we won’t bother trying to make it 
more precise，We won’t use it very much. 

(9.4) Proposition. Let f(x) be an irreducible polynomial over a field F. If one root 
of fin K can be expressed by radicals ， so can any other root. 


Proof. Suppose that one root a can be expressed by radicals, say using the 
tower F = Fq C … CZ F r . Choose a field L which contains F r and which is a split¬ 
ting field of some polynomial of the form f(x)g(x) over F, Then L is also the split¬ 
ting field offg over F(a). Let a ' be a root of/in another field A" \ and let L ' be a 
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splitting field of fg over F(a f ). Then we can extend the isomorphism 

F(a) - >F(a f ) to an isomorphism <p : L - >L f ( 5 . 2 ). The tower of fields 

F = <p(F 0 ) C … C <p(F r ) shows that a' is expressible by radicals- □ 

( 9 , 5 ) Proposition* Let a be a complex number which can be expressed by radicals 
over F. Then a tower of fields F = F Q C … C F r = K can be found so that the 
conditions (i) and (ii) of ( 9 . 2 ) hold and, in addition ， 

(iii) for each j, Fj is a Galois extension of Fj-i and the Galois group G(Fj/Fj-\) is a 
cyclic group. 

Proof. Consider the tower given in the definition ( 9 . 2 ), in which F r = 
尸 ( 0 i ， … ， / 3 r). As we have remarked，we may assume that fip E Fj^i for some 
prime integer pj. Let ^ = e 2m ’ p j be the Pj-th root of L We form a new chain of 
fields by adjoining the elements {^ Pl ， … ， ( Pr ; )8 】 ， … ，厂 ） in that order. Theorem (7,4) 
and Proposition (8.2) show that each of these extensions is Galois，with cyclic Galois 
group. Some of the extensions in this tower may be trivial because of redundancy. If 
so, we shorten the chain. Since the last field F({i Pj ], {ft}) in this chain contains F rj it 
contains a. □ 


Let us consider the Galois group of a product of polynomials f(x)g (x) over F. 
Let f be a splitting field of fg. Then K f contains a splitting field K of/，because / 
fectors into linear fectors in Similarly, K f contains a splitting field F f of g. So 
we have a diagram of fields 


( 9 . 6 ) 


K f 

o o 

O (y 


F f 


( 9 . 7 ) Proposition. With the above notation, let G = G(K/F), H = G{F 9 /F), 
and^ = G(K f /F). 

(a) G and H are quotients of 

(b) 嗆 is isomorphic to a subgroup of the product group Gx H. 

Proof. The first assertion follows from the feet that K mdF r are intermediate 
fields which are Galois extensions of F ( 5 . 7 b). Let us denote the canonical homo- 
morphisms 劣 - > G ， 省 - >H by subscripts: cr^wv^oy and Then oy de¬ 

scribes the way that a operates on the roots of/，and a g describes the way it operates 
on the roots of g. We map ^ to Gx H by {ay ， (r g ). If cr/ and a g are both the 

identity, then cr operates trivially on the roots of fg, and hence cr = 1 . This shows 

that the map 劣 - >G x H is injective and that 爷 is isomorphic to a subgroup of 

GxH.u 
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(9,8) Proposition. Let/be a polynomial over F whose Galois group G is a simple 
nonabelian group. Let F' be a Galois extension of F, with abelian Galois group. Let 
be a splitting field of / over F r . Then the Galois group G{K f /F r ) is isomorphic 
to G. 

This proposition is a key point. It tells us that if the Galois group of/is a simple 
nonabelian group，then we will not make any progress toward solving for its roots if 
we replace F by an abelian extension F 9 . 

Proof of Proposition (9.8). We first reduce ourselves to the case that [F f : F] 
is a prime number. To do this，we suppose that the lemma has been proved in that 
case, and we choose a cyclic quotient group H of G{F f /F) of prime order. Such a 
quotient exists because G(F f /F) is abelian. This quotient determines an intermedi¬ 
ate field Fi C F f which is a Galois extension of F, and such that G{F\/F) — H 

(5.7) . Let Ki be the splitting field of / over F { , Then since [F\ : F] is a prime ， 
G(Ki/Fi) = G, So we may replace F by F { and K by 尺 i • Induction on[F F : F] will 
complete the proof. 

So we may assume that [F f : F] = p and that H = G(F r /F) is a cyclic group 
of order/?. The splitting field K f will contain a splitting field of / over F, call it K. 
We are then in the situation of Proposition (9.7). So the Galois group 爷 of f over 
F is a subgroup of G x // s and it maps surjectively to G. It follows that | G \ divides 
| 省 |， and 丨省 | divides |G x//| = p\G\. If |G| = | 省 |， then counting degrees shows 
that K f = KAn this case, K contains the Galois extension F f , and hence // is a quo¬ 
tient of G (5.7b). Since G is a nonabelian simple group, this is impossible. The only 
remaining possibility is that 省 =G x H. Applying the Main Theorem to the chain 
of fields F C F f C K\ conclude that G(K r /F f ) = G, as required. □ 

(9-9) Theorem, The roots of a quintic polynomial/(x) whose Galois group is Ss 
or As can not be expressed by radicals over F. 

Proof. Let ^ be a splitting field of fAfG — Ss, then the discriminant of / is 
not a square in F. In that case, we replace F by F(8), where 5 is a square root of the 
discriminant in K. The Galois group G(K/F(8)) is A 5 . Obviously，it is enough to 
show that the roots of / can not be expressed by radicals over the larger field F(8). 
This reduces the case that the group is S 5 to the case that it is A$ - 

Suppose that the Galois group of/is A 5 but that some root a of/is expressible 
by radicals over 厂 Say that a E： F r , where F r is the end of a chain of field exten¬ 
sions F = F 0 C … C F r ，each extension in the chain being Galois ， with a cyclic 
Galois group. Now since the Galois group of/over F is a simple group，Proposition 

(9.8) shows inductively that for each /， the Galois group off over Fi is As too. On the 
other hand, since it has a root a in F r ，the polynomial / will not remain irreducible 
over that field. Therefore the Galois group of/over F r will not operate transitively on 
the five roots of/ in a splitting field. In particular, the Galois group can not be the 
alternating group. This is a contradiction, which shows that the roots of/are not ex¬ 
pressible by radicals .口 
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We will now exhibit a specific quintic polynomial over Q whose Galois group 
is S 5 . The fects that 5 is prime and that the Galois group G acts transitively on the 
roots {ai,... a 5 } limit the possible Galois groups greatly. For，since the action is tran¬ 
sitive, I GI is divisible by 5 • Thus G contains an element of order 5. The only ele¬ 
ments of order 5 in 5 5 are cyclic permutations such as cr = (12345), 

(9.10) Lemma. If G contains a transposition, then G = 

Proof• By transposition t we mean, as always, a permutation which inter¬ 
changes two indices. We may assume that G contains the cyclic permutation a 
above. Renumbering if necessary, we may assume that t acts as (li). We replace a 
by a l ~ l and renumber again，to reduce to the case that r is the transposition (12)‘ It 
remains only to verify that cr and r generate which is left as an exercise. □ 

(9.11) Corollary• Suppose that the irreducible polynomial (9.1) has roots 
{ai ， … ， a 5 }, and let K be its splitting field. If F(a\ ? a 2 ,a 3 ) < K, then G(K/F) is the 
symmetric group S $. 

For let F ; = 尸 (ai ， a 2 ， a 3 ). The only nontrivial permutation fixing a\,a 2 ,a 3 is the 
transposition (45). If F f ^ K, this permutation must be in G(K/F f ). Thus G{K/F) 
contains a transposition. □ 

(9.12) Corollary. Let/(x) be an irreducible quintic polynomial over Q with ex¬ 
actly three real roots. Then its Galois group is the symmetric group, and hence its 
roots can not be expressed by radicals. 

For，call the real roots ai ， a 2 ， a 3 . Then Q(ai,a 2? a 3 ) C [R, but since a 4 ,a 5 are not 
real, K is not a subfield of U. So we can apply Corollary (9.11) to conclude that the 
Galois group of/is *S 5 . By Theorem (9*9), the roots of/can not be expressed by rad¬ 
icals. □ 

(9.13) Example. The polynomial x 5 - 16jc = x(x 2 - 4)(x 2 + 4) has three real 
roots，but of course it is not irreducible. But we can add a small constant without 
changing the number of real roots. This is seen by looking at the graph of the poly¬ 
nomial. For instance ， 

x 5 — 16x + 2 

still has three real roots，and it is irreducible by the Eisenstein Criterion [Chapter 10 
(4,9)]. So its roots can not be expressed by radicals over Q. 


IIparait apres cela qu f il n f y a aucun fruit a tirer 

de la solution que nous proposons • 

Evariste Galois 
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EXERCISES 

L The Main Theorem of Galois Theory 

1. Determine the irreducible polynomial for i + V2 over Q . 

2. Prove that the set (1 ， / ， V2 ? / V2) is a basis for V2) over Q . 

3. Determine the intermediate fields between Q and Q(V2 ? V3). 

4. Determine the intermediate fields of an arbitrary biquadratic extension without appealing 
to the Main Theorem. 

5 - Prove that the automorphism Q(V2) sending V2 to -V2 is discontinuous. 

6. Determine the degree of the splitting field of the following polynomials over Q. 

⑻； c 4 — 1 (b) x 3 - 2 (c) jc 4 + 1 

7. Let a denote the positive real fourth root of 2. Factor the polynomial x 4 — 2 into irre¬ 
ducible factors over each of the fields Q ， Q(V2), Q(V^ ， /) ， Q(a), Q(a,i). 

8* Let ^ — e 27ri/5 . 

(a) Prove that K = Q(^) is a splitting field for the polynomial x 5 — \ over Q，and de¬ 
termine the degree [K \ Q]. 

(b) Without using Theorem (1.11)，prove that AT is a Galois extension of Q, and deter¬ 
mine its Galois group. 

9, Let AT be a quadratic extension of the form F(a), where a 2 = a E. F. Determine all ele¬ 
ments of K whose squares are in F. 

10. Let K = 0(V2 ? V3, V5)* Determine [K : Q], prove that /T is a Galois extension of Q, 
and determine its Galois group. 

11. Let K be the splitting field over Q of the polynomial /(jc) ™ 

(;c 2 一 lx — l)(x 2 — 2x — 7). Determine G(K/Q), and determine all intermediate 
fields explicitly. 

12. Determine all automorphisms of the field Q( A v // 2). 

13. Let K/F be a finite extension. Prove that the Galois group G(K/F) is a finite group, 

14. Determine all the quadratic number fields Q[Vrf] which contain a primitive pth root of 
unity, for some prime p 垆 2. 

15. Prove that every Galois extension K/F whose Galois group is the Klein four group is 
biquadratic. 

16. Prove or disprove: Let/(;c) be an irreducible cubic polynomial in Q[x] with one real root 
a. The other roots form a complex conjugate pair so the field L = Q(j8) has an au¬ 
tomorphism (7 which interchanges (3, (3. 

17. Let 7^ be a Galois extension of a field F such that G(K/F) « C 2 x C\ 2 . How many inter¬ 
mediate fields L are there such that (a) [L : F] = 4 9 (b) [L : F] = 9, 

(c) G(K/L) - C 4 ? 

18. Let f(x) = jc 4 + fcc 2 + c E FW，and let 夂 be the splitting field of /. Prove that 
G(K/F) is contained in a dihedral group D 4 . 

19. Let F = F 2 (w) be the rational function field over the field of two elements. Prove that the 
polynomial jc 2 — w is irreducible in F[x] and that it has two equal roots in a splitting 
field. 
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20. Let F be a field of characteristic 2 ? and let K be an extension of F of degree 2. 

(a) Prove that K has the form F(a), where a is the root of an irreducible polynomial 
over F of the form jc 2 + jc + a, and that the other root of this equation is a + 1. 

(b) Is it true that there is an automorphism of K sending +1? 

2. Cubic Equations 

1. Prove that the discriminant of a real cubic is positive if all the roots are real, and nega¬ 
tive if not. 

2. Determine the Galois groups of the following polynomials. 

(a) x 3 -2 (b) jc 3 + 27jc - 4 (c) jc 3 + x + 1 (d) jc 3 + 3jc + 14 
(e) jc 3 — 3jc 2 + 1 (f) jc 3 — 21jc + 7 (g) x 3 + x 2 — 2x — l 
(h) jc 3 + jc 2 — 2jc + 1 

3. Let/be an irreducible cubic polynomial over F, and let 8 be the square root of the dis¬ 
criminant off. Prove that / remains irreducible over the field F{ 8 ). 

4. Let a be a complex root of the polynomial jc 3 + jc + 1 over Q, and let be a splitting 
field of this polynomial over Q* 

(a) Is V-3 in the field Q(a)? Is it in K 1 

(b) Prove that the field 0(a) has no automorphism except the identity. 

*5. Prove Proposition (2.16) directly for a cubic of the form (2.3), by determining the for¬ 
mula which expresses a 2 in terms of a' ， 8 , p，q explicitly. 

6* Let/ G 0 [jc] be an irreducible cubic polynomial which has exactly one real root，and let 
K be its splitting field over O. Prove that [A^: O] = 6. 

7* When does the polynomial jc 3 + px + q have a multiple root? 

8. Determine the coefficients p,q which are obtained from the general cubic (2,1) by the 
substitution (2,2), 

9, Prove that the discriminant of the cubic x 3 + px q is -4p 3 — 27q 2 . 

J. Symmetric Functions 

1. Derive the expression (3.10) for the discriminant of a cubic by the method of undeter¬ 
mined coefficients. 

2. Let f(u) be a symmetric polynomial of degree d in ⑷，…， and let 
/ 0 (wi ， … ， Wrt-i) = /(Wi ， … ， w rt -i ， 0). Say that/°(w) - g(s°), where sP are the elemen¬ 
tary symmetric functions in ⑷， … ， w 打叫 ， Prove that if rt > d ， then/(w) = g(s). 

3. Compute the discriminant of a quintic polynomial of the form x 5 + ax + b. 

4. With each of the following polynomials，determine whether or not it is a symmetric func¬ 
tion, and if so, write it in terms of the elementary symmetric functions. 

(a) Ui 2 u 2 + u 2 2 u\ (n = 2) 

(b) U\ 2 u 2 + u 2 2 U 3 + (n = 3) 

(c) (wi + u 2 )(u 2 + u 3 )(ui + u 3 ) (n = 3) 

(d) Ui 3 u 2 + u 2 3 U 3 + u 3 3 Ui — UiU 2 3 — u 2 U 3 3 — U 3 U 1 3 (n = 3) 

(e) U \ 3 + Uz + … + Un 

5. Find two natural bases for the ring of symmetric functions, as free module over the ring 

R. 
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*6. Define the polynomials wi>v n in variables u\^^,u n by Wk — U\ k + * * * + u n k . 

(a) Prove Newton's identities: Wk — s\Wk-\ + S 2 Wk -2 — ••• 土 Sk-\^\ + ksk = 0 . 

(b) Do generate the ring of symmetric functions? 

7. Let/(jc) = ;c 3 + a 2 x 2 + a!x + a 0 . Prove that the substitution x = xi — (a 2 /3) does not 
change the discriminant of a cubic polynomial. 

8. Prove that [F(w) : F(s)] = nl by induction，directly from the definitions* 

9. Let Ui^.^Un be variables and let D\ denote the discriminant. Define 

02 = s n ( u i ^ u j) 2 - 

k i<j 
/ ， j 丰 k 

(a) Prove that d 2 is a symmetric polynomial，and compute its expression in terms of the 
elementary symmetric polynomials for the cases n — 2,3. 

(b) Let a ，…，％ be elements of a field of characteristic zero* Prove that £>i ( 山， ... ， a rt )= 
D 2 (a Ut .^a n ) - 0 if and only if the number of distinct elements in the set {ai ， …， 

is < n - 2. 

10. Compute the discriminants of the polynomials given in Section 2, exercise 2. 

*11. (Vandermonde determinant) (a) Prove that the determinant of the matrix 

1 U\ U\ 2 
1 U 2 

m 

* 

争 

1 u n Un 2 

is a constant multiple of 8 (w). 

(b) Determine the constant. 

4L Primitive Elements 

1* Let G be a group of automorphisms of a field K. Prove that the fixed elements K G form 
a subfield of K. 

2* Let a = ^2,^ = \(-l + V^3) ? (3 = 

(a) Prove that for all r G Q, y = a + r/3 is the root of a sixth-degree polynomial of 
the form jc 6 + ax 3 + b. 

(b) Prove that the irreducible polynomial for a + /3 is cubic. 

(c) Prove that a - has degree 6 over Q. 

3. For each of the following sets of automorphisms of the field of rational functions <C(y )， 
determine the group of automorphisms which they generate, and determine the fixed 
field explicitly, 

(a) cr(y) = (b) cr(y) = iy (c) cr(y) = -y, r(y) = y^ 1 (d) a(y) = r(y) = 

where I = e 27Tlj/3 (e) cr{y) = iy, r(y) — y~ l 

4. (a) Show that the automorphisms a(y) - (y + i)/(y - i), T(y) = i{y — l)/(y + 1) 

of C(y) generate a group isomorphic to the alternating group A 4 . 

*(b) Determine the fixed field of this group. 

*5. Let F be a finite field, and let/( jc) be a nonconstant polynomial whose derivative is the 
zero polynomial. Prove that / is not irreducible over F. 
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5. Proof of the Main Theorem 

1. Let K = 0(a) ? where a is a root of the polynomial x 3 + 2x + 1, and let g(x)= 
x 3 + x + 1. Does g(x) have a root in 夂？ 

2. Let/ E F [jc] be a polynomial of degree n，and let 尤 be a splitting field for/. Prove that 
[K : F] divides n !. 

3. Let G be a finite group. Prove that there exists a field F and a Galois extension K of F 
whose Galois group is G. 

4. Assume it known that ir and e are transcendental numbers. Let K be the splitting field of 
the polynomial x 3 + ttx + 6 over the field F = Q{7t). 

(a) Prove that [K : F] = 6 . 

(b) Prove that K is isomorphic to the splitting field of jc 3 + ejc + 6 over Q(e). 

5. Prove the isomorphism F[x]/(f(x)) — F[x]/(f(x)) used in the proof of Lemma (5.1) 
formally，using the universal property of the quotient construction. 

6 . Prove Corollary (5*5). 

7. Let/( jc) be an irreducible cubic polynomial over Q whose Galois group is S 3 . Determine 
the possible Galois groups of the polynomial (jc 3 - 1 ) • /(jc), 

K f 

O O 

8 . Consider the diagram of fields K F f 

O O 

F 

in which K is a Galois extension of F, and K f is generated over F by K and F f . Prove 
that r is a Galois extension of F f and that its Galois group is isomorphic to a subgroup 
of G(K/F). 

9. Let K D L D F be fields. Prove or disprove: 

(a) If K/F is Galois, then K/L is Galois* 

(b) If K/F is Galois, then L/F is Galois. 

(c) If L/F and K/L are Galois，then K/F is Galois, 

10. Let K be a splitting field of an irreducible cubic polynomial /(jc) over a field F whose 
Galois group is Determine the group G(F(a)/F) of automorphisms of the extension 
F(a). 

11. Let K/F be a Galois extension whose Galois group is the symmetric group S 3 . Is it true 
that K is the splitting field of an irreducible cubic polynomial over FI 

12. Let K/F be a field extension of characteristic p ^ 0, and let a be a root in K of an irre¬ 
ducible polynomial /(jc) = x p - x — a over F. 

(a) Prove that a + 1 is also a root of /(jc). 

(b) Prove that the Galois group of / over F is cyclic of order p. 

6s Quartic Equations 

1 . Compute the discriminant of the quartic polynomial ;c 4 + 1 ， and determine its Galois 
group over O* 

2. Let K be the splitting field of an irreducible quartic polynomial f(x) over F, and let the 
roots of /(jc) in K be ai ,a 2 ,a 3 ? « 4 . Also assume that the resolvent cubic g(x) has a root ， 
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say /3i = a\a 2 + a 3 a 4 . Express the root ai explicitly in terms of a succession of square 
roots. 

3. What can you say about the Galois group of an irreducible quartic polynomial over Q 
which has exactly two real roots? 

4. Suppose that a real quartic polynomial has a positive discriminant. What can you say 
about the number of real roots? 

5. Let K be the splitting field of a reducible quartic polynomial with distinct roots over a 
field F. What are the possible Galois groups of K/Fl 

6 . What are the possible Galois groups over O of an irreducible quartic polynomial f(x) 
whose discriminant is negative? 

7. Let g be the resolvent cubic of an irreducible quartic polynomial/ G F[x], Determine 
the possible Galois groups of g over F, and in each case, say what you can about the 
Galois group of/. 

8 . Let K be the splitting field of a polynomial/ G F[x] with distinct roots and 

let G = G(K/F). Then G may be regarded as a subgroup of the symmetric group S n . 
Prove that a change of numbering of the roots changes G to a conjugate subgroup. 

9. Let ai ” " ， be the roots of a quartic polynomial* Discuss the symmetry of the elements 
a\a 2 and a\ + a 2 along the lines of the discussion in the text. 

10, Find a quartic polynomial over Q whose Galois group is (a) S 4 , (b) D 4? (c) C 心 

11* Let a be the real root of a quartic polynomial / over O. Assume that the resolvent cubic 

is irreducible. Prove that a can’t be constructed by ruler and compass. 

12. Determine the Galois groups of the following polynomials over Q* 

(a) x 4 + 4x 2 + 2 (b) x 4 + lx 2 + 4 (c) x A + 4x 2 — 5 (d) jc 4 — 2 (e) x 4 + 2 

(f) ? + 1 (g) JC 4 + JC + 1 (h) JC 4 + JC 3 + JC 2 + JC + 1 (i) X 4 + X 2 + 4 

13. Compute the discriminant of the quartic polynomial x 4 + ax + using the formula in 
Lemma (8.7)* 

*14. Let / be an irreducible quartic polynomial over F of the form jc 4 + rjc + s, and let 
ai be the roots of/in a splitting field K. Let 17 = a x a 2 . 

(a) Prove that 17 is the root of a sextic polynomial h (jc) with coefficients in F. 

(b) Assume that the six products aiaj are distinct. Prove that h{x) is irreducible, or else 
it has an irreducible quadratic factor. 

(c) Describe the possibilities for the Galois group G = G(K/F) in the following three 
cases: h is irreducible, h is a product of an irreducible quadratic and an irreducible 
quartic, and h is the product of three irreducible quadratics. 

(d) Describe the situation when some of the products are equal. 

15. Let K be the splitting field of the polynomial jc 4 — 3 over Q. 

(a) Prove that : Q] = 8 and that K is generated by i and a single root a of the poly¬ 
nomial. 

(b) Prove that the Galois group of K/Q is dihedral，and describe the operation of the el¬ 
ements of G on the generators of K explicitly. 

16. Let K be the splitting field over Q of the polynomial x 4 — 2x 2 - 1* Determine the 
Galois group G of K/Q, find all intermediate fields，and match them up with the sub¬ 
groups of G. 

17. Let /(jc) be a quartic polynomial* Prove that the discriminants of / and of its resolvent 
cubic are equal. 
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18. Prove the irreducibility of the polynomial (6.17) and of its resolvent cubic. 

19. Let K be the splitting field of the reducible polynomial (x - l) 2 (x 2 + 1) over Q. Prove 
that 8 G Q ，but that G(K/Q) is not contained in the alternating group. 

20. Let/( jc) be a quartic polynomial with distinct roots, whose resolvent cubic g(x) splits 

completely in the field F. What are the possible Galois groups of f(x)7 

21. Let C = e 277 " 3 be the cube root of 1, let a = and let K be the splitting field 

of the irreducible polynomial for a over . Determine the possible Galois groups of K 
over O(^). 

22. Let % be a subgroup of the symmetric group S n - Given any monomial m, we can form 

the polynomial p(u) = crm. Show that if m = u\u 2 2 u^ 3 u n -i n ~\ then p{u) is 

partially symmetric for M .，that is, it is fixed by the permutations in ^ but not by any 
other permutations. 

23. Let p{u) be the polynomial formed as in the last problem, with = A n . Then the orbit 
of p(u) contains two elements, say p(u) ， q(u). Prove that p(u) — q(u) = ±8(u). 

24. Determine the possible Galois groups of a reducible quartic equation of the form 

x 4 + bx 2 + c f assuming that the quadratic y 2 + by + c is irreducible. 

25. Compute the discriminant of the polynomial x 4 + rx + s by evaluating the discrimi¬ 

nants of jc 4 — a: and jc 4 ~ 1. 

26. Use the substitution to determine the discriminant of the polynomial 

x 4 + ax 3 + b. 

27. Determine the resolvent cubic of the polynomials (a) x 4 + rx + s and (b) x 4 + a t x 3 + 
a 2 x 2 + a 3 x + a 4 . 

28. Let f(x) = x 4 - 2rx 2 + (r 2 ~ $ 2 t;)，with r,s,v G F. Assume that / is irreducible, and 
let G denote its Galois group. LetL = F(V^ ， 5 )， where 8 2 = D. Prove each statement. 

(a) L(a) = K 

(b) If [L : F] = 4, then G = D 4 . 

(c) If [L : F] - 2 and 5 £ F, then G = C 4 . 

29. Determine the Galois groups of the last two examples of (6.5). 

30* Determine the action of the Galois group G on the roots {a,a \ 一 a ， 一 a'} (6.7) explic¬ 
itly, assuming that (a) G = CU ， (b) G = D 4 . 

31 • Determine whether or not the following nested radicals can be written in terms of 
unnested ones，and if so, find an expression. 

(a) V 2 +VT 1 (b) V6+VTT (c) V11 + 6Vl (d) Vll + V6 

*32. Let K be the splitting field of a quartic polynomial/( jc) over Q，whose Galois group is 
£) 4 , and let a be a real root of/(jc) in K. Decide whether or not a can be constructed by 
ruler and compass if (a) all four roots off are real, (b) /has two real roots. 

33. Can the roots of the polynomial jc 4 + — 5 be constructed by ruler and compass? 

7. Kummer Extensions 

1. Suppose that a Galois extension K/F has the form ^ = F(a) and that for some integer n, 
a n E F. What can you say about the Galois group of K/F? 

*2* Let a be an element of a field F，and let be a prime* Suppose that the polynomial 
~ a is reducible in F[x] t Prove that it has a root in/. 
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3. Let F be a subfield of C which contains /， and let 夂 be a Galois extension of F whose 
group is C 4 * Is it true that K has the form F(a), where a 4 G FI 

4. Let f{x) — jc 3 + px + q be an irreducible polynomial over a field F，with roots 
ai,a 2 ,otz. Let /3 — ai + ^a 2 + ( 2 a 3 , where [ — e 2m/3 t Show that (3 is an eigenvector 
of a for the cyclic permutation of the roots unless /3 = 0 , and compute / 3 3 explicitly in 
terms of p ， q ， 8，l 

5. Let ^ be a splitting field of an irreducible polynomial /(jc) E F[x] of degree p whose 

Galois group is a cyclic group of order p generated by a ， and suppose that F contains the 
pth root of unity ( = Let ai,a 2 ^^,a p be the roots of / in K. Show that 
^ + C v a 2 + ^a 3 + + ^~ x)l/ a p is an eigenvector of cr, with eigenvalue C 

unless it is zero* 

6 . Let /(jc) 二 jc 3 + p;c + ^ be an irreducible polynomial over a subfield F of the complex 
numbers, with complex roots a — a u a 2 ,a 3 . Let K = F(a), 

(a) Express ( 6 a 2 + 2/?) 1 explicitly，as a polynomial of degree 2 in a. 

(b) Assume that 8 = Vd is in F，so that K contains the other roots off. Express a 2 as a 
polynomial in a = a\ and 8. 

(c) Prove that (1 ，汀 i ， aj is a basis of K, as F-vector space. 

(d) Let cr be the automorphism of K which permutes the three roots cyclically. Write the 
matrix of <p with respect to the above basis，and find its eigenvalues and eigenvec¬ 
tors* 

(e) Let v be an eigenvector with eigenvalue I = e l7Tl ^ t Prove that if V-3 E F then 
v 3 G F. Compute v 3 explicitly, in terms of p ， q ， 8, V-3. 

(f) Dropping the assumptions that 8 and V-3 are in F，express v in terms of radicals, 

(g) Without calculation, determine the element v f which is obtained from t; by inter¬ 
changing the roles of a^a 2 . 

(h) Express the root ai in terms of radicals, 

8. Cyclotomic Extensions 

1. Determine the degree of ^ over the field Q(^ 3 ). 

2. Let ( 二 （ 13 , and let K = Q(^). Determine the intermediate field of degree 3 over Q ex¬ 
plicitly . 

3* Let ^ = ^i 7 . Determine the succession of square roots which generate the field 
+ ( 16 ) explicitly. 

4. Let ( = ( 7 . Determine the degree of the following elements over Q. 

⑻ （+ r (b) ^ + r (c) p p + ( 6 

5. Let ( = (i 3 . Determine the degree of the following elements over Q. 

(a) i + C 1 (b) ( + p (C) ( + + r (d) C 2 + c 5 ^ r 

(e) (+ p p + r 2 (f) ^ ^ + r + v 1 (g> (+ + r + r + ( io + ( i2 

6 . Let 【 =(ii. 

(a) Prove that a = ^ + + ^ + generates a field of degree 2 over Q, and 

find its equation. 

(b) Find an element which generates a subfield of degree 5 over Q, and find its equation. 

7 . Prove that every quadratic extension of Q is contained in a cyclotomic extension. 

8 * Let K = Q ( 心 ) . 

(a) Prove that 尺 is a Galois extension of Q. 
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(b) Define an injective homomorphism v: G(K/Q) - >U to the group U of units in the 

ring Z/(n). 

(c) Prove that this homormophism is bijective when n = 6,8, 12, (Actually, this map is 
always bijective.) 

*9. Let p be a prime，and let a be a rational number which is not a pth power. Let AT be a 
splitting field of the polynomial x p — a over O. 

( a ) Prove that K is generated over Q by a pth root a of a and a primitive pth root ( of 


(b) 

(c) 


unity. 

Prove that [^ : Q] = p(p - 1). 

Prove that the Galois groups of K/Q is isomorphic to the group of invertible 2x2 

a b 


matrices with entries in ¥ p of the form 


ments 


a 


and 


b 


， and describe the actions of the ele 


on the generators explicitly. 


10. Determine the Galois group of the polynomials jc 8 — 1 ， jc 12 — 1 ， jc 9 - 1. 

11. ( a ) Characterize the primes p such that the regular p-gon can be constructed by ruler and 


compass. 

(b) Extend the characterization to the case of an n-gon，where n is not necessarily prime. 


*12* Let v be a primitive element modulo a prime p, and let d be a divisor of p — l. Show 
how to determine a sum of powers of (= 心 which generates the subfield L of Q(C) of 


degree d over Q，using the list of roots of unity {C,C V ^ V ，…， （ 


vP~ % 


9. Quintic Equations 

1. Determine the transitive subgroups of 5 5 . 

Let G be the Galois group of an irreducible quintic polynomial. Show that if G contains 
an element of order 3, then G = S 5 ov A 5t 

*3. Let pbc a prime integer, and let G be a p-group. Let // be a proper normal subgroup of 
G. 

(a) Prove that the normalizer N (//) of H is strictly larger than H. 

(b) Prove that H is contained in a subgroup of index p and that that subgroup is normal 
in G. 

(c) Let ^ be a Galois extension of Q whose degree is a power of 2， and such that 
K G U. Prove that the elements of K can be constructed by ruler and compass. 

4. Let K ] L D F be a tower of field extensions of degree 2. Show that K can be gener¬ 
ated over F by the root of an irreducible quartic polynomial of the form x 4 + bx 2 + c. 

*5* Cardano’s Formula has a peculiar feature; Suppose that the coefficients p, of the cubic 
are real numbers. A real cubic always has at least one real root. However, the square 
root appearing in the formula (2.6) will be imaginary if (q/2) 2 + (p/3) 3 < 0. In that 
case，the real root is displayed in terms of an auxiliary complex number u. This was con¬ 
sidered to be an improper solution in Cardano’s time* Let /(jc) be an irreducible cubic 
over a subfield F of U, which has three real roots. Prove that no root of/is expressible 
by real radicals, that is，that there is no tower F = F 0 C ... C F r as in (9.2), in which 
all the fields are subfields of R, 

6. Let f(x) G F[x] be an irreducible quintic polynomial, and let K be a splitting field for 
/(jc) over F. 
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(a) What are the possible Galois groups G(K/F), assuming that the discriminant D is a 
square in F? 

*(b) What are the possible Galois groups if D is not a square in FI 

7. Determine which real numbers a of degree 4 over Q can be constructed with ruler and 
compass in terms of the Galois group of the corresponding polynomial. 

8. Is every Galois extension of degree 10 solvable by radicals? 

*9. Find a polynomial of degree 7 over Q whose Galois group is S 7 . 

Miscellaneous Problems 


1. Let be a Galois extension of F whose Galois group is the symmetric group S 4 . What 
numbers occur as degrees of elements of K over FI 

2. Show without computation that the side length of a regular pentagon inscribed in the unit 
circle has degree 2 over O. 

3. (a) The nonnegative real numbers are those having a real square root. Use this fact to 

prove that the field U has no automorphism except the identity. 

(b) Prove that C has no continuous automorphisms except for complex conjugation and 
the identity. 

4. Let K/Fb^a Galois extension with Galois group G, and let // be a subgroup of G. Prove 
that there exists an element (3 E ： K whose stabilizer is H. 

*5. (a) Let ^ be a field of characteristic p. Prove that the Frobenius map <p defined by 
(p(x) = x p is a. homomorphism from K to itself. 

(b) Prove that cp is an isomorphism if 尺 is a finite field. 

(c) Give an example of an infinite field of characteristic p such that cp is not an isomor¬ 
phism* 

(d) Let K = where q = p r , and let F — ¥ p . Prove that G(K/F) is a cyclic group of 
order r，generated by the Frobenius map <p. 

(e) Prove that the Main Theorem of Galois theory holds for the field extension K/F. 

6. Let K be a subfield of C ， and let G be its group of automorphisms. We can view G as 
acting on the point set K in the complex plane. The action will probably be discontinu¬ 
ous, but nevertheless, we can define an action on line segments [a ， /3] whose endpoints 
are in K, by defining g[a,/3] = [ga, g/3]. Then G also acts on polygons whose vertices 
are in K, 

(a) Let K = 0(()，where ^ is a primitive fifth root of 1. Find the G-orbit of the regular 
pentagon whose vertices are 1 ，（，（ W 4 . 

(b) Let a be the side length of the pentagon of (a)* Show that a = a 2 G K, and find the 
irreducible equation for a over Q* Is a G K? 

7. A polynomial / E F[xi jc„] is called 姜 -symmetric if f(uo-\ ， .•. ， Uo-n) = f{u\ ”“ ， u n ) 
for every even permutation a of the indices，and skew-symmetric if/(W m ，…， = 
(sign a)f(ui ，…， for every permutation a. 

(a) Prove that the square root of the discriminant 8 = n /<y * (ut — uj) is skew-symmet¬ 
ric* 

(b) Prove that every ▲■symmetric polynomial has the form/ + g5, where/，g are sym¬ 
metric polynomials. 

*8. l^tf(x,y) E C[x,y] be an irreducible polynomial, which we regard as a polynomial 
fiy) in y. Assume that / is cubic as a polynomial in y. Its discriminant D, computed 
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with regard to the variable y ? will be a polynomial in x. Assume that there is a root jc 0 of 
d(x) which is not a multiple root. 

(a) Prove that the polynomial f(x 0 ,y) in y has one simple root and one double root. 

(b) Prove that the splitting field K of f(y) over C(x) has degree 6. 

9* Let K be a subfield of C which is a Galois extension of Q. Prove or disprove: Complex 
conjugation carries K to itself，and therefore it defines an automorphism of K. 

*10. Let ^ be a finite extension of a field F, and let f(x) E K[x]. Prove that there is a 
nonzero polynomial g(x) G K[x] such that f(x)g(x) G F[x]. 

*11. Let /( ； c) be an irreducible quartic polynomial in F[x]. Let ai ? a 2 , a 3 , a 4 be its roots in a 
splitting field K. Assume that the resolvent cubic has a root /3 — a\a 2 + a 3 a 4 in but 
that the discriminant D is not a square in F. According to the text, the Galois group of 
K/F is either C 4 or D 4 . 

(a) Determine the subgroup H of the group S 4 of permutations of the roots at which sta¬ 
bilizes (3 explicitly. Don’t forget to prove that no permutations other than those you 
list fix /3. 

(b) Let y = a\a 2 ~a 3 a 4 and 6 = a\+a 2 —a 3 -a 4 . Describe the action of H on these ele¬ 
ments. 

(c) Prove that y 2 and e 2 are in F. 

(d) Let 8 be the square root of the discriminant. Prove that if y # 0 ? then 8y is a square 
in F if and only if G = C 4 . Similarly, prove that if 6 # 0, then 8e is a square in F if 
and only if G ^ C 4 . 

(e) Prove that y and 6 can’t both be zero. 

*12. Let F - ¥ p (u, u) be a rational function field in two variables over the field ¥ p with p ele¬ 
ments, and let K = F(a,f3 )， where a ? /3 are roots of the polynomials x p - u and 
x p — v respectively. Prove the following. 

(a) The extension K/F has no primitive element. 

(b) The elements y = (i + ca, where c G generate infinitely many different inter¬ 
mediate fields L. 

*13. Let Kbea field with p r elements. Prove that the Frobenius map defined by <p (x) = x p is 
a linear transformation of K, when K is viewed as a vector space the prime field F = ¥ p , 
and determine its eigenvectors and eigenvalues. 


Wie weit diese Methoden reichen werden, muss erst 

die Zukunft zeigen. 

Emmy Noether 
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Background Material 

Historically speaking，it is of course quite untrue 
that mathematics is free from contradiction; 
non-contradiction appears as a goal to be achieved， 
not as a God-given quality that has been granted us once for all, 

Nicolas Bourbaki 


LSET THEORY 


This section reviews some conventions about set theory which are used in this book, 
as well as some facts which will be referred to occasionally. 

First, a remark about definitions: Any definition of a word or a phrase will 
have roughly the form 

(1.1) xxx if @#&$% , 

where xxx is the word which is being defined and @#&$% is its defining property. 
For example, the sentence “An integer n is positive if n > 0” defines the notion of a 
positive integer. In a definition, the word if means if and only if. So in the definition 
of the positive integers, all integers which don’t satisfy the requirement n > 0 are 
ruled out. 

The notation 

(1.2) {s G S I @#&$%} 

stands for the subset of S consisting of all elements s such that @#&$% is true. 
Thus if Z denotes the set of all integers, then N = {n E： Z\n > 0} describes as 
the set of positive integers or natural numbers • 

Elements fli,..., of a set are said to be distinct if no two of them are equal. 
A map cp from a set 5 to a set T is any function whose domain of definition is S 
and whose range is T. The words function and map are used synonymously. We re¬ 
quire that a function be single-valued. This means that every element s E S must 
have a uniquely determined image (p(s) E T. The range 7 1 of is not required to be 


585 



586 


Background Material Appendix 


the set of values of the function. By definition of a function, every image element 
(p(s) is contained in T, but we allow the possibility that some elements ? E T are 
not taken on by the function at all. We also take the domain and range of a function 
as part of its definition. If we restrict the domain to a subset, or if we extend the 
range, then the function obtained is considered to be different. 

The domain and range of a map may also be described by the use of an arrow. 
Thus the notation <p\ S — T tells us that is a map from S to T, The statement that 
t = <p(s) may be described by a wiggly arrow: s … t means that the element s E ： S is 
sent to t E r by the map under consideration. For example, the map Z Z such 
that <p(n) = 2n + 1 is described by n ^ 2n + L 

The image of the map <p is the subset of T of elements which have the form 
<p{s) for some j G 5, It will often be denoted by im <p, or by <p(S): 

(1.3) im <p — {t E Tjt = <p(s) for some s E 5}, 

In case im <p is the whole range T, the map is said to be surjective. Thus <p is surjec¬ 
tive if every t E ： T has the form <p ( 5 ) for some s E ： S, 

The map <p is called injective if distinct elements si, S 2 of S have distinct im¬ 
ages, that is ? if s { ^ implies that <p(s\) ^ <p(s 2 )^ A map which is both injective 
and surjective is called a bijective map. A permutation of a set 5 is a bijective map 
from S to itself. 

Let <p ' S — T and ifr.T — S be two maps. Then 少 is called an inverse function 
of <p if both of the composed maps <p 0 中： T — T and if/° (p: S S av& the identity 
maps, that is，if <p(il/(t)) = t for all / E T and t// {<p (5)) = j for all s E S. The in¬ 
verse function is often denoted by 

(1.4) Proposition. A map (p: S — T has an inverse function if and only if it is bi¬ 
jective - 

Proof • Assume that <p has an inverse function if/, and let us show that <p is both 
surjective and injective. Let t be any element of T, and let s = Then <p(s )= 
<p(if/(t)) = t. So t is in the image of This shows that <p is surjective. Next, let 
si , s 2 be distinct elements of 5, and let U = <p (s t ). Then if/ (u) = Si , So / 1 5 h have dis¬ 
tinct images in 5, which shows that they are distinct. Therefore <p is injective. Con¬ 
versely, assume that <p is bijective. Then since <p is surjective，every element t ElT 
has the form t = <p(s) for some s E S. Since <p is injective, there can be only one 
such element 5 - So we define *// by the following rule: is the unique element 

s S S such that = t. This map is the required inverse function. □ 

Let 5 — > 7 1 be a map, and let t/ be a subset of T. The inverse image of U is 
defined to be the set 

(1.5) <p-\U) = {j e 5 I <p(s) G U}. 

This set is defined whether or not <p has an inverse function. The notation 9 —\ as 
used here，is symbolic. 

A set is called finite if it contains finitely many elements. If so, the number of 
its elements, sometimes called its cardinality ， will be denoted by \S\. We will also 
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call this number the order of S. If S is infinite，we write |5| = oo. The following 
theorem is quite elementary, but it is a very important principle. 

(1.6) Theorem, Let <p: S ^ The a map between finite sets. 

(a) If <p is injective，then \S\ ^ |r|. 

(b) If <p is surjective，then \S\ > \T\. 

(c) If \S\ = I T | ? then <p is bijective if and only if it is either injective or surjec¬ 
tive. □ 

The contrapositive of part (a) is often called the pigeonhole principle : 151 > in 

then <p is not injective• For example, if there are 87 socks in 79 drawers，then some 
drawer contains at least two socks. 

An infinite set S is called countable if there is a bijective map <p: N — S from 
the set of natural numbers to S, If there is no such map，then S is said to be uncount- 
able. 


(L7) Proposition. The set U of real numbers is uncountable. 

Proof. This proof is often referred to as Cantor’s diagonal argument. Let 
> [R be any map. We list the elements of the image of <p in the order <p(l) ， <p(2) 9 
<p (3)，...，and we write each of these real numbers in decimal notation. For example, 
the list might begin as follows: 

<p(l) = 82,3 5470984534... 

<p(2) = .1 2390345700... 

<p(3) = 5 .9 0840598675... 

<p(4) -1 2.8 7435264444... 

<p(5) = ,0 0 1 4 4 1 0 0 3 4 9... 


We will now determine a real number which is not on the list. Consider the real 
number u whose decimal expansion consists of the underlined digits: u = 

• 3 2 8 3 4 …. We form a new real number by changing each of these digits，say 

u = _4 5 1 4 2 ... • 

Notice that u 关 <p(l)，because the first digit, 4, of u is not equal to the correspond¬ 
ing digit ， 3, of Also, t; ^ <p(2), because the second digit, 5, of u is not equal 
to the corresponding digit of <p(2). Similarly, v ^ <p(n) for all n. This shows that <p 
is not surjective, which completes the proof，except for one point. 

Some real numbers have two decimal expansions: ,99999 … is equal to 
1.00000... ? for example. This creates a problem with our argument. We have to 
choose v so that infinitely many of its digits are different from 9 and 0* The easiest 
way is to avoid these digits altogether. □ 
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At a few places in the text，we refer to Zorn’s Lemma, which is a tool for han¬ 
dling uncountable sets. We will now describe it. A partial ordering of a set 5 is a 
relation s ^ s f which may hold between certain elements and which satisfies the fol¬ 
lowing axioms for all s, s f , s ff in S: 

( 1 . 8 ) 

(i) .y < 

(ii) if s < s r and s r ^ s r \ then s ^ s ff ; 

(iii) if s < s r and s r < s y then s = s r . 

A partial ordering is called a total ordering if in addition 

(iv) for all \ 〆 in *S，s ^ s f or ^ s. 

For example, let 5 be a set whose elements are sets. If A, fi are in S, we may 
define A B if A C fi. This is a partial ordering on S, called the ordering by inclu¬ 
sion. Whether or not it is a total ordering depends on the particular case. 

If A is a subset of a partially ordered set S, then an upper bound for A is an 
element b E S such that for all a E A, a < A partially ordered set S is called 
inductive if every totally ordered subset T of S has an upper bound in 5, 

A maximal element m E S is any element such that S contains no larger one, 
that is，such that there is no element s E ： S with m ^ s, except for m itself. This 
doesn’t mean that m is an upper bound for S ; in particular，there may be many dif¬ 
ferent maximal elements. For example，the set of all proper subsets of {1 ， ... ， n} con¬ 
tains n maximal elements, one of which is {1 ， 3, 4 ”"， 

(1.9) Lemma. Zorn's Lemma: An inductive partially ordered set has a maximal 
element. □ 

Zorn’s Lemma is equivalent with the axiom of choice, which is known to be inde¬ 
pendent of the basic axioms of set theory. We will not enter into a further discussion 
of this equivalence, but we will show how Zorn’s Lemma can be used to show that 
every vector space has a basis. Let us use unordered sets of vectors here. 

(1.10) Proposition• Every vector space V over a field has a basis. 

Proof • We take for S the set of (unordered) linearly independent subsets of V, 
partially ordered by inclusion, as above. We check that S is inductive: Let T be a to¬ 
tally ordered subset of S. Then we claim that the union of the sets making up T is 
also linearly independent; hence it is in S. To verify this, let 

B = U A 

be the union. By definition, a relation of linear dependence on B is finite，so it can 
be written in the form 

(1.11) ClUi + … + C n Vn - 0 ? 
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with Vi E B. Since S is a union of the sets A E T, each Vi is contained in one of 
these subsets, call it A；. Let i, j be two of the indices* Since T is totally ordered, 
At C Aj or else A) C A/. It follows by induction that one of the sets, say contains 
all the others. Call this set A, Then Vi E A for all / = 1，…， 《• Since A is linearly 
independent, (1.11) is the trivial relation. This shows that B is linearly independent ， 
hence that it is an element of S. 

We have verified the hypothesis of Zorn’s Lemma. So S contains a maximal el¬ 
ement B, and we claim that 5 is a basis. By definition of S, B is linearly indepen¬ 
dent. Let = Span (B). IfW < V, then we choose an element u G V which is not 
in W. Then the set B U {v} is linearly independent [see Chapter 3 (3*10)], This con¬ 
tradicts the maximality of B and shows that W = V, hence that 5 is a basis. □ 

A similar argument proves Theorem (8.3) of Chapter 10. 

(1.12) Proposition, Let be a ring. Every ideal I R is contained in a maximal 
ideal. 

We leave this proof as an exercise. □ 


2. TECHNIQUES OF PROOF 

Exactly what mathematicians consider an appropriate way to present a proof is not 
clearly defined. It isn’t customary to give proofs which are complete in the sense 
that every step consists in applying a rule of logic to the previous step. Writing such 
a proof would take too long, and the main points wouldn’t be emphasized. On the 
other hand, all difficult steps of the proof are supposed to be included. Someone 
reading the proof should be able to fill in as many details as needed to understand it. 
How to write a proof is a skill which can be learned only by experience. 

We will discuss three important techniques used to construct proofs: di¬ 
chotomy, induction ， and contradiction. 

The word dichotomy means division into two parts. It is used to subdivide a 
problem into smaller，more easily managed pieces. Other names for this procedure 
are case analysis and divide and conquer• Here is an example of dichotomy: One 
definition of the binomial coefficient (j?) (read n choose k) is that (j?) is the number of 
subsets of order k in the set {i ， 2 , …， n}. For example, (2) = 6 : The six subsets of or- 
der2of{l,2,3,4} are {1,2}, {1,3}, {1,4}, {2,3} ， {2,4}, {3,4}. 

(2,1) Proposition. For every integer n and every k ^ n, (2) = (^ 1 ) + (d)_ 

Proof. Let 5 be a subset of {1 ， 2，".，《} of order k. Then either n E S or 
n ^ S. This is our dichotomy, Ifn 篆 S ，then S is actually a subset of { 1 ， 2 ,…， n — 1 }. 
By definition，there are ( n ^ 1 ) of these subsets. Suppose that n E ： S, and let 5'= 
S — {n} be the set obtained by deleting the element n from the set S. Then ' is a 
subset of {1 ， 2,…，- 1}，of order n — \. There are (k-\) such sets S f . Hence there 
are (t-l) subsets of order k which contain n. This gives us d 1 ) + (?-i) subsets of 
order k altogether. □ 
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The remarkable power of the method of dichotomy is shown here: In each of 
the two cases ，n E S and n f 5, we have an additional fact about our set S. This 
additional fact can be used in the proof. 

Often a proof will require sorting through several possibilities, examining each 
in turn. This is dichotomy, or case analysis. For instance, to determine the species of 
a plant ， Gray’s Manual of Botany leads through a sequence of dichotomies, A typical 
one is “leaves opposite on the stem (go to h)，or leaves alternate (go to k) •，’ 
Classification of mathematical structures will also proceed through a sequence of di¬ 
chotomies. They need not be spelled out formally in simple cases，but when one is 
dealing with a complicated range of possibilities, careful sorting is needed* Here is a 
simple example: 

(2.2) Proposition. Every group of order 4 is abelian. 

Proof • Let G be a group of order 4, and let x, y be two elements of G. We are 
to show thatxy = yx. Consider the five elements \,x y y,xy,yx. Since there are only 
four elements in the group，two of these must be equal. Ifxy = yx, the proposition 
is verified. We now run through the other possibilities: 

Case 1: x = l ory = 1. If x = 1, then xy = y = yx. Ify = 1， then xy = x = yx. 
Case 2: xy = l or yx = 1. Then y = x~\ and xy = l = yx. 

Case 3: x = y. Then xy — x 2 = yx. 

Case 4; Either xy = x,yx — x y xy = 3 ；, or yx = 3 ；. In the first two cases, we can¬ 
cel x to conclude that : y = 1 ， which puts us back in Case 1. In the last two 
cases, we cancel y. 

This exhausts all possibilities and completes the proof. □ 

Induction is the main method for proving a sequence of statements P n , indexed 
by positive integers n. To prove P n for all n，the principle of induction requires us to 
do two things: 

(2.3) 

(i) prove that Pi is true，and 

(ii) prove that //， for some integer k > \ 9 P k is true ， then Pk+i is also true. 

Sometimes it is more convenient to prove that if ， for some integer k s 0, P k -x is 

true，then Pk is true. This is just a change of the index* 

Here are some examples of induction: 

(2.4) Proposition • The determinant of an upper triangular matrix is the product of 
its diagonal entries. 

Proof. Here P n is the assertion that the proposition is true for an n x n trian¬ 
gular matrix• In case of a 1 x 1 matrix, there is only one diagonal entry, and it is 
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equal to the determinant. This means that Pi is true. We now assume that Pk-\ is 
true，and we prove Pk using that fact. Let A be a triangular k x k matrix. We expand 
the determinant by minors on the first column: 

det A = an det An — a 2 \ detAu + "• * 

Since A is triangular the terms fl 2 i ， a 31 ， •”， are all zero，so det A = an det A u ^ 
Now notice that A u is a (/: — 1 ) x (/： — 1 ) triangular matrix and that its diagonal 
entries are ^ 22 ,^ 33 ,.-,^* Since Pk i is true by hypothesis, det An is the product 
an … akk- Therefore det A = ^ 11^22 a**, as required. □ 


(2,5) Proposition . ⑺ 


n! 


k\(n - k)\ * 


Proof. Let P r be the statement that (0 


r! 


k\(n — ^)! 

sume that P r ^\ is true. Then the formula is true when we substitute n 
k = k and is also true when we substitute « = r — 1 and k = k ~ l: 


for all k = l ， ". ， r. As- 


r 


1 and 




(r - 1 )! 


/:!(r ~ \ ~ k)l 


and (Pi) 


(r - 1)! 


(k - l)!(r - kV/ 


According to Proposition (2,1), (1) = { r ~k [ ) + (Pi)- Thus 


⑴ = CV ) + (【二 i ) 


(r - 1)! 


- + 


(r — 1)! 


kl(r — 1 — ^)! (k — l)!(r — /:)! 


k 


r\ 


kl{r - k)\ 


+ 


k r! 


r! 


k\(r - k)\ k\(r 一 A:)f 


This shows that P r is true, as required* □ 


As another example, let us prove the pigeonhole principle (1.6a), that if a map 
<p: S — T between finite sets is injective, then |*S| s |r|_ We use induction on 
n = 17 1 1_ The assertion is true if n = 0, that is, if T is empty, because the only set 
which has a map to the empty set is the empty set 

We suppose that the theorem has been proved fox n = k — 1, and we proceed 
to check it for n = k y where k > 0. We suppose that \T\ = k, and we choose an el¬ 
ement t ElT. 


Case 1: t is in the image of <p. Since <p is injective, there is exactly one element 

5 G 5 such that <p{s) = t. Let 5' = 5 — {^} and 7 1 ' = T — {t}. Restricting 
<p to S f ， we obtain an injective map <p f \ S r T\ Since \T r \ = 
IT 7 ] — 1 = ^ — 1 ? our induction hypothesis implies that | ^ \T r 
Therefore |5| = | + 1 < || + 1 = |r|. 

Case 2: t is not in im <p. In this case the image of <p is contained inT f = T — {t}. 

So <p defines an injective map S—T,. Our induction hypothesis again im¬ 
plies that |5| ^ I = |r| — L □ 
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There is a variant of the principle of induction, called complete induction. Here 
again, we wish to prove a statement P n for each positive integer n. The principle of 
complete induction asserts that it is enough to prove the following statement: 

(2-6) If n is a positive integer，and ifPk is true 

for every positive integer k < n，then P n is true. 

When rt = 1， there are no positive integers k < n. So the hypothesis of (2.6) is au¬ 
tomatically satisfied for n = 1, Hence a proof of (2,6) must include a proof of P u 
The principle of complete induction is used when there is a procedure to re¬ 
duce P n to Pk for some smaller integers k, but not necessarily to P n —u Here is an ex¬ 
ample: 

(2.7) Theorem. Every integer n > 1 is a product of prime integers. 


An informal proof, which also exhibits an algorithm for finding a prime factoriza¬ 
tion, goes as follows: If n is a prime integer, then it is the product of one prime，and 
we are done. If not, then it has a divisor different from 1 and n. If n is given to us 
explicitly, we will be able to check whether or not there is such a proper divisor. If 
so, then n can be written as a product of integers, say n = ab, neither of which is 1 ， 
and then a and b are less than n. We continue factoring a and b if possible. Since the 
size of the factors decreases each time, this procedure can not be continued 
indefinitely, and eventually we end up with a prime factorization of n. 

The principle of complete induction formalizes the statement that one can’t 
continue replacing a positive integer by a smaller one infinitely often. To apply the 
principle, we let P n be the statement that n is a product of primes, and we assume 
that Pk is true for all k < n. We go through the argument again. Either n is prime, in 
which case we are done，or else n — ab and a and b are less than n. In this case the 
induction hypothesis tells us that P a and Pb are both true, that is, that a and b are 
products of primes. Putting these products side by side gives us the required factor¬ 
ization of n. 

The two proofs look slightly different from each other，because the algorithm 
is not mentioned in the statement of the theorem and has been partially suppressed 
in the formal proof. A better statement of the theorem would exhibit the algorithm: 

(2.8) Theorem. The procedure of fectoring an integer > 1 terminates after 
finitely many steps. 

In this formulation，the formal proof becomes identical with the informal one, □ 

Proofs by contradiction proceed by assuming that the desired conclusion is 
false and deriving a contradiction from this assumption. The conclusion must there¬ 
fore be true. We can, for example, rewrite the proof given above that a group of or¬ 
der 4 is abelian, in this way: 
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Proof of (2.2) ， Rewritten. We suppose that G is a nonabelian group of order 4. 
and we proceed to derive a contradiction from this assumption. Since G is not abe¬ 
lian, there are elements jc, y E G such that xy ¥= yx. Then y can not be any one of 
the elements l y x,x~\ because those elements commute with x. Similarly, x is not 
equal to 1 ， y ， ory~\ We may now check that the elements \,x,y,xy,yx are distinct. 
This contradicts the hypothesis that |G| = 4, Therefore there does not exist a non¬ 
abelian group of order 4. □ 

Notice that there is no real difference between the two proofs of (2,2). The proof 
just given is really a fake contradiction argument, and，though logically correct, it is 
not aesthetically pleasing. One should avoid writing proofs in this way. On the other 
hand，there are true proofs by contradiction, in which the proof is not easily turned 
around to eliminate the contradiction. The proof given in the text [Chapter 6(1,13)] 
that a group of order/? 2 ,/? a prime, is abelian is an example, as is the proof of (3,11) 
given below. 


3. TOPOLOGY 


This section reviews some concepts from topology which we will need from time to 
time. The sets which we want to study are subsets of Euclidean space U k . 

Let r be a positive real number. The open ball of radius r about a point 
X G U k is the set of all points whose distance to X is less than r: 

(3.1) Bx,r = {x f e 1R* \x f - X\< r}. 

A subset U of R k is called open if whenever a point X lies in U the points sufficiently 
near to X also lie in U. In other words, U is open if it satisfies the following condi¬ 
tion: 

(3.2) IfX E U and if r is sufficiently small, then Bx, r C U. 

The radius r will depend on the point X. 

Open sets have the following properties: 


(3.3) 

(i) The union of an arbitrary family of open sets is open. 

(ii) The intersection of finitely many open sets is open. 

The whole space IR^ and the empty set 0 are the simplest examples of open 
sets. Some more interesting open sets are obtained in this way: Let/be a continuous 
function [R*—> U. Then the sets 

(3.4) {/ >0}, {/ <0},{/^0} 
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are open. For instance, iff(x) > 0, then/^O > 0 for all X f near X, because / is 
continuous. This shows that the general linear group GL 2 (U) is an open subset of the 
space W of all 2 x 2 matrices，because it is the set {det P ¥= 0}. Also, the open ball 
Bxj is an open set in U k , because it is defined by the inequality \x f — x\ — r < 0. 

Let S be any set in U k . We will also need the concept of open subset of 5, By 
definition，a subset V of S is called open in S if whenever it contains a point X, then 
it also contains all points of S which are sufficiently near to X. This condition is ex¬ 
plained by the following lemma: 

(3.5) Lemma. Let V be a subset of a set S in U k . The following conditions on V 
are equivalent. If either one of them holds, then V is called an open subset of S: 

(i) V = U fl for some open set U of U k ; 

(ii) For every point X E V, there is an r > 0 so that V contains the set n 

Proof • Assume that V = U O S for some open set U of U k . Let X ElV. Then 
X E t/, and (3.2) guarantees the existence of an r > 0 such that Bx,r C U. So 
Bx，r C\ S C U n*S = V"，and (ii) is verified. Conversely, suppose that (ii) holds. 
For each X E ： V, choose an open ball Bx, r such that Bx，r C\ S C V, with the radius r 
depending as usual on the point X. Let U be the union of these balls. Then U is an 
open set in U k (3.3i)，and U H S <Z V. On the other hand, X E Bx,r H S <Z U C\ S 
for every X E ： V. Therefore V C U C\ S f and V = U H 5 as required. □ 

Open subsets of S have the properties (3.3), which follow from the same prop¬ 
erties of open subsets of U k because of (3.5i). 

It is customary to speak of an open set Vof 5 which contains a given point p as 
a neighborhood of p in S. 

A subset C of a set S is called closed if its complement (S — C) is open. For 
example, let : U k ^ U (i = 1，...，/：) be continuous functions. The locus 

(3.6) {/] = f 2 = = fk = 0} 

of solutions to the system of k equations/ = 0 is a closed set in U k , because its com¬ 
plement is the union of the open sets {fi ^ 0}. The 2-sphere {x? + xl + xl = 1} is 

an example of a closed set in [R 3 , So is the rotation group S0 2 ^ It is the locus in U 2X2 

defined by the five equations 

XnX 2 2 - X\ 2 X2l = 1, Xu 2 + Xi 2 X 2 \ = 1, X21X12 + X 2 2 = 1, 

XnXn + X12X22 = 0 , XnXn + ^22^21 = 0 . 

Closed sets have properties dual to (3.3): 

(3.7) 

(i) The intersection of an arbitrary family of closed sets is closed. 

(ii) The union of finitely many closed sets is closed. 

These rules follow from (3.3) by complementation. 
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A subset C of is called bounded if the coordinates of the point in C are 
bounded, meaning that there is a positive real number b, a bound, such that for 
X = (xi,...,x n ) E C, 

( 3 . 8 ) \xi \ < b, 

for all i = 1，，..， rt. If C is both closed and bounded，it is called a compact subset of 
(R 夂 The unit 2-sphere is a compact set in U 3 . 

Let S y T be subsets of U m and U n , A map/: is called continuous if it car¬ 

ries nearby points of S to nearby points of T. Formally, the property of continuity is 
stated this way: 

(3.9) Let s ^ S. For every real number € > 0, there is a 8 > 0 such that if 
s f G S and 卜 ' 一 s | < 3, then \ f(s r ) — f(s) \ < €. 


The easiest way to get a continuous map from 5 to 7 1 is as a restriction of a continu¬ 
ous map F: > [FT which happens to carry S to T. Most of the maps we use are of 

this form. For example, the determinant is a continuous function from any one of 
the classical groups to U or C. 

A map/: S-^S f is called a homeomorphism if it is bijective and if/" 1 , as well 


as/, is continuous. 

For example, the unit circle 5 1 in R 2 is homeomorphic to the rotation group 
50 2 . The homeomorphism f\ S x —>SOi is given by restricting the map 


F( ； Ci ， X2) 


X\ x 2 


l~X 2 Xl 」 

which carries U 2 to the space [R 4 of 2 x 2 matrices. The map F is not bijective and 
is therefore not a homeomorphism, but it restricts to a homeomorphism / on the sub¬ 
sets S l and SOi. Its inverse is the restriction to SOi of the projection G: 1R 4 —> R 2 
which sends a 2 x 2 matrix to its top row. (The word homeomorphism must not be 
confused with homomorphism !) 


A path is a continuous map/: [0,1]—> R* from the unit interval to the space 
and the path is said to lie in S if f(t) E. S for every t E [0,1]. A subset S of U k 
is called path-connected if every pair of points p y q E ： S can be joined by a path ly¬ 
ing in S. In other words，for every pair of points p y q E, S 9 there is a path/such that 


(3.10) 

(i) f(t) E S for all t in the interval; 

(ii) /⑼ =/? and/(l) = q. 


Here is the most important property of path-connected sets: 

(3.11) Proposition. A path-connected set S is not the disjoint union of proper 
open subsets. In other words, suppose that 

S = U V ； ， 

a 
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where Vi are open sets in S and Vi Pi = 0 if i ’关 Then all but one of the sets Vi 

is empty. 

Proof. Suppose that two of the sets are nonempty，say Vb and Vi • We set aside 
Vo and replace Vi by the union of the remaining subsets, which is open by (3.3). 
Then V 0 U Vi = 5 and Vo 0 V\ = 0. This reduces to the case that there are exactly 
two open sets. 

Choose points p E Vo and ^ E Vi, and let/: [0,1]—^5 be a path in S con¬ 
necting p to q. We will obtain a contradiction by examining the path at the point 
where it leaves Vo for the last time. 

Let b be the least upper bound of all r E [0,1] such that/(f) E Vo, and let 
X = f(b). If X E Vo, then all points of 5 叉 ' Pi are in Vb，if r is small enough- 
Since/is continuous,/(r) E Bx, r for all t sufficiently near b. So/(f) E Vo for these 
points. Taking t slightly larger than b contradicts the choice of ^ as an upper bound 
of the points mapping to Vb. Therefore X is not in V 0 , so it has to be in V\. But rea¬ 
soning in the same way, we find that/(0 E V\ for all t sufficiently near b. Taking t 
slightly smaller than b contradicts the choice of b as the least upper bound of points 
mapping to Vb. This contradiction completes the proof. □ 

The final concept from topology is that of manifold • 

(3.12) Definition. A subset S of R n is called a manifold of dimension d if every 
point p of S has a neighborhood in S which is homeomorphic to an open set in U d . 

For example, the sphere {(x,y 9 z) \x 2 + j 2 + z 2 = 1} is a two-dimensional mani¬ 
fold. The half sphere C/ = {z > 0} is open in S 3 (3,4, 3.5) and projects continuously 
to the unit ball Bo,i = {x? + jcl + Jcl < 1} in IR 3 . The inverse function z = 
Vl — x 2 — y 2 is continuous. Therefore U is homeomorphic to Since the 3- 
sphere is covered by such half spheres, it is a manifold. 

The figure below shows a set which is not a manifold. It becomes a manifold 
of dimension 1 when the point p is deleted. Note that homogeneity is false for this 
set. It looks different near p from how it looks near the other points. 



(3.13) Figure. A set which is not a manifold. 
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4L THE IMPUCIT FUNCTION THEOREM 

The Implicit Function Theorem is used at two places in this book，so we state it here 
for reference. 

(4.1) Theorem. Implicit Function Theorem: L&tf(x,y) = be 

functions of « + r real variables (x,y) = (jci ，， .. ， jc m ， ;yi，___，>)，which have continu¬ 
ous partial derivatives in an open set of R n+r containing the point (a 9 b). Assume 
that the Jacobian determinant 

* 


_ 

is not zero at the point (a, b). There is a neighborhood U of the point a in R n such 
that there are unique continuously differentiable functions Y r (x) on U satis¬ 

fying 

f(x. y(jc)) = 0 and Y(a) = b. 

The Implicit Function Theorem is closely related to the Inverse Function Theo¬ 
rem, which is used in Chapter 8 (5.8): 

(4,2) Theorem. Inverse Function Theorem: Let /be a continuously differentiable 
map from an open set U of R n to U n . Assume that the Jacobian determinant 


* • • 

is not zero at a point a E R n . There is a neighborhood of a on which/has a contin¬ 
uously differentiable inverse function. 

We refer to the book by Rudin listed in the Suggestions for Further Reading for 
proofs of these two theorems. □ 

We also use the following complex analogue of the Implicit Function Theorem 
in one place [Chapter 13 (8.14)]: 

(4.3) Theorem. Let f(x,y) be a complex polynomial. Suppose that for some 
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is not zero at (a,b). Since/is a polynomial in x,y 9 the real functions yi are also poly¬ 
nomials in Xi.yj. So they have continuous derivatives. 

(4.4) Lemma* Let f(xj) be a polynomial with complex coefficients. With the 
above notation ， 


W 


a/ = M + M / 

dy dyo dyo 


， and 


(ii) the Cauchy-Riemann equations 


dyo 


眾 and 縈 


Ml 

办 0 


hold* 


Proof of the Lemma. Since/is a polynomial and since the derivative of a sum 
is the sum of the derivatives，it is enough to prove the lemma for the monomials 
cy n = (Co + C[i)(yo + For these monomials, the lemma follows from the 

product rule for differentiation, by induction on n. □ 

We return to the proof of Theorem (4.3)* By hypothesis, fi{a 0 ,a\, b 0 ,b\) = 0. 

Also, since ^-(a,b) # 0, we know by (4.4i) that ^ - do and ^ = d\ are not 
both zero. By (4.4ii)，the Jacobian determinant is 


det 


do —d\ 
d\ do 


do 2 + d x 2 >0 


⑷ ㈣， we have 制 = 0 and^ 0. There is a neighborhood t/ of ， 
in C on which a unique continuous function Y(x) exists having the properties 

f(x,Y(x)) = 0 and Y(a) = b. 

Since references for this extension are not so common，we will give a proof which 
reduces it to the real Implicit Function Theorem, The method is simply to write ev¬ 
erything in terms of its real and imaginary parts and then to verify the hypotheses of 
(4.1). The same argument will apply with more variables. 

Proof• We write x = x 0 + x “， j + y\i, / = /o + j\i, where /• = 
fi(xo,Xi 9 y 0 ,yi) is a real-valued function of four real variables. We are to solve the 
pair of equations/ 0 = f\ = 0 fory 0 ,y\ as functions of Xo,X \. According to (4.1) ? we 
have to prove that the Jacobian determinant 


/o l 力 Alyl 
3-3 no - 3 

片 lyoAIyo 

3-3 no - 3 

t 

c 


This shows that the hypotheses of the Implicit Function Theorem (4.1) are satisfied. □ 
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1 . Set Theory 

1. Let <p: Z—IR be the map defined by <p(n) — n 3 4 5 — 3n + 1. 

(a) Is ip injective? 

(b) Determine <p~\U), where U is the interval (i) [0,oo) ，（ ii) [2,4] ，（ iii) [4,12]. 

2. Give an example of a map <p\ S—S from an infinite set to itself which is surjective but 
not injective, and one which is injective but not surjective. 

3. Let <p: S— r be a map of sets* 

(a) Let U be 2 l subset of S. Prove that (p((p~ l (U)) C U and that if <p is surjective, then 

(b) Let V be a subset of T. Prove that <p~ l (<p(V)) D V and that if <p is injective, then 
<p~ l (<p(V)) = V. 

4 . Let <p: S^T be a map of nonempty sets, A map if/: T^S is a left inverse if 
\j/ ° <p: S—> S is the identity and a right inverse if <p ° if/: T^T is the identity. Prove that 
(p has a left inverse if and only if it is injective and has a right inverse if and only if it is 
surjective. 

5* Let 5 be a partially ordered set. 

(a) Prove that if S contains an upper bound b for S, then b is unique, and also is a max¬ 
imal element. 

(b) Prove that if S is totally ordered, then a maximal element m is an upper bound for S. 

6. (a) Describe precisely which real numbers have more than one decimal expansion and 

how many expansions such a number has, 

(b) Fix the proof of Proposition (1.7). 

7. Use Zorn’s Lemma to prove that every ideal / ¥= /? is contained in a maximal ideal. Do 
this by showing that the set S of all ideals 1 f R ，ordered by inclusion，is inductive. 

2. Techniques of Proof 

1. Use induction to find a closed form for each of the following expressions. 

⑻ 1 + 3 + 5 + … + (2 打 + 1) 

(b) l 2 + 2 2 + 3 3 + … + n 2 

(c) 1 + 1/2 + 1/3 + … + 1/n 

1 1 1 

1 M - 2 2 ■ 3 n(n + 1) 

2. Prove that l 3 + 2 3 + … + n 3 = (n(n + l)) 2 /4. 

3. Prove that 1/(1 ■ 2) + 1/(2 . 3) + … + l/(n(n + 1)) - n/(n + 1). 

4 . Let S, T be finite sets. 

(a) Let <p: S - >T be an injective map. Prove by induction that |5| ^ |r| and that if 

\S\ = |r|, then <p is bijective, 

(b) Let <p: S - ^7 be a surjective map. Prove by induction that \S\ ^ |r| and that if 

\S\ = \T\, then <p is bijective, 

5. Let n be a positive integer. Show that if 2 n — 1 is a prime number, then n is prime. 
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6. Let a n = 2 in + 1. Prove that a n = ciocii … 办 -i + 2. 

7. A polynomial with rational coefficients is called irreducible if it is not constant and if it is 
not a product of two nonconstant polynomials whose coefficients are rational numbers. 
Use complete induction to prove that every polynomial with rational coefficients can be 
written as a product of irreducible polynomials. 

8. Prove parts (b) and (c) of Theorem (1.6 》 


3. Topology 

1. Let 5 be a subset of U k , and let f,gbc continuous functions from S to IR. Determine 
whether or not the following subsets are open or closed in S. 

(a) {/(X)>0} (b) {f(X) ^2} (c) {f(X) < 0, g(X) > 0} 

(d) {/(X) < 0, ^(X) < 0} (e) {f(X)^0,g(X) = 0} (f) {f(X) E 1} 

(g) {f(X) E Q} 

2. LetX E U n . Determine whether or not the following sets are open or closed. 

(a) {rX\r E IR, r >0} (b) {rX\r E U, r > 0} 

3. (a) Let P = (pij) be an invertible matrix, and let d = det P. We can define a map 

GL n (U) — ^ R n2+1 by sending P^(p ip d), Show that this rule embeds GL„(R) as a 
closed set in R n2+! . 

(b) Illustrate this map in the case of GLi(R). 

4. Prove that the product of M x M f two manifolds M ， M' is a manifold. 

5 - Show that <SL 2 ([R) is not a compact group. 

6* (a) Sketch the curve C: xl = x] — x\ in IR 2 . 

(b) Prove that this locus is a manifold of dimension 1 if the origin is deleted. 


4L The Implicit Function Theorem 

1. Prove Lemma (4.4). 

2. Prove that 5L 2 (IR) is a manifold，and determine its dimension* 

3. Letf(x y y) be a complex polynomial. Assume that the equations 

/ = 0，— - 0, -^ = 0 
3 dx dy 


have no common solution in C 2 . Prove that the locus/ = 0 is a manifold of dimension 2. 
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NOTATION 

A n the alternating group, Chapter 2 (4.7) 

B Xt r the open ball of radius r about the point X， Appendix (3.1) 

C the field of complex numbers ， Chapter 2 (Lll) 

C n the cyclic group of order n. Chapter 5 (3.4) 

D n the dihedral group ， Chapter 5 (3-4) 
det determinant ， Chapter 1 (3.4) 

¥ p the prime field //(/?)，Chapter 3 (2.4) 

GL n the general linear group ， Chapter 2 (L13) 

I the identity matrix ， Chapter 1 (1.14) 

I the icosahedral group ， Chapter 5 (9.1) 

im <p the image of the map Appendix (L3) 

ker (p the kernel of the homomorphism Chapter 2 (4.5) 

€° the space of bounded sequences. Chapter 3 (5.2) 

M the group of motions of the plane，Chapter 4 (5-15), Chapter 5 ( 2 . 1 ) 

N(H) the normalizer of H, Chapter 6 (3,7) 

㈨ the set of positive integers, or natural numbers ， Chapter 10 ( 2 , 1 ) 

O n the orthogonal group ， Chapter 5 (5.3), Chapter 8 (1.3) 

Osa the Lorentz group ， Chapter 8 (1.4) 

PSL n the projective group ， Chapter 8 (8.2) 

R the field of real numbers ， Chapter 2 (Lll) 

U n the space of n-dimensional vectors. Chapter 3 (1.1) 

S n the symmetric group ， Chapter 2 (1.14) 

S n the n-dimensional sphere ， Chapter 8 (2.6) 

SL n the special linear group. Chapter 2 (4.6), Chapter 8 (1.8) 

SO n the special orthogonal group ， Chapter 4 (5.4), Chapter 8 (1.8) 

SPzn the symplectic group ， Chapter 8 (1.6) 

SU n the special unitary group ， Chapter 8 ( 1 . 8 ) 

T the tetrahedral group ， Chapter 5 (9.1) 
f (superscript t) the 加呵 of a matrix, Chapter 1 (2.24) 
tr trace 、 Chapter 4 (4.18) 

U n the unitary group ， Chapter 7 (4.15), Chapter 8 ( 1 . 8 ) 

Z the center of a group ， Chapter 2 (4.10) 

Z the ring of integers ， Chapter 2 (1.11) 

Z(x) the centralizer of X ， Chapter 6 (1,5) 

* If A is a complex matrix, then A % = A\ Chapter 7 (4.7) 

In a matrix display，* denotes an undetermined entry, Chapter 1 (1.15) 
The starred exercises are some of the more difficult ones. 

+ (superscript +) The group whose law of composition is addition ， 
Chapter 2 ( 1 . 1 ) 

x (superscript x) The group whose law of composition is multiplication ， 
Chapter 2 (LI) 

㊉ direct sum ， Chapter 3 (6.4), Chapter 12 (6.3) 

! factorial; n] is the product of the integers 1 ， 2 , . • . ， w. 
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(j?) a binomial coefficient ，Appendix ( 2 . 1 ) 

[fx] the largest integer < fi. Chapter 11 (10.23) 

If S and T are sets，we use the following notation: 

S\ the number of elements, also called the order of the set S. 

s E ： S s is an element of S. 

S C T S is a subset of T, or S is contained in T, In other words, every element of 

S is also an element of T. 

T D S T contains S ，which is the same as S C T. 

S < T 5 is a proper subset of T, meaning that it is a subset, and that T contains 

an element which is not a member of S. 

T > S This is the same as S < 7\ 

T - S This notation is used only when 5 is a subset of T, and then it denotes the 

complement of S in T, the set of all elements which are in T but not in S: 

T — 5 = {jc|jcG 7 but x ^ S}. 

S H T The intersection of the sets S and T, which is the set of all elements in 

common to S and T. 

S U T The union of the sets S and r，which is the set of all elements jc which are 

contained in at least one of the sets S and T. 

S x T the product set. Its elements are ordered pairs (s, t) of elements: 

S x T = {(5, t)\s E 5, r E T}. 

Since the parentheses have other meanings，we sometimes leave them off ， 
and denote an element of the product set by s,t. 

<p'S — T a map cp from S to T, or a function whose domain is S and whose range is 

T. 

s^t The wiggly arrow indicates that the map under consideration sends the el¬ 
ement s to the element t, i.e.，that cp(s) = t. 

□ This symbol indicates that a digression in the text, such as a proof or an 

example，has ended, and that the text returns to the main thread. □ 
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Appendix: 

J. R. Munkres ， Topology; A First Course, Prentice Hall，Englewood Cliffs, NJ ， 

1975. 

W* Rudin ， Principles of Mathematical Analysis, 3rd ed, McGraw-Hill，New York, 
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Abel，570 

Abelian character, 325 
Abelian group, 451 

Abelian groups，Structure Theorem, 472 
Addition: 

in a field, 83 
matrix, 2 
in a module, 450 
in a ring，346 
vector ， 78, 86 
Adjoint matrix, 29, 250 
Adjoint representation, 304 
Adjunction: 

of an element，365 
symbolic, 506 
Affine group, 306 
Algebra: 

Fundamental Theorem of, 527 
Lie，291 

Algebraically closed field, 527 
Algebraically dependent, 525 
Algebraically independent, 525 
Algebraic closure, 527 
Algebraic curve, 376 
irreducible, 386 
Algebraic element, 493 
Algebraic extension, 499 
Algebraic geometry, 373 
Algebraic group, 289, 299 
Algebraic integer, 410 
Algebraic number, 345 
Algebraic variety, 373 


Algorithm, Todd-Coxeter, 223 
Almost everywhere, 516 
Alternating group, 52 
Angle: 

between vectors, 126, 248 
trisection of, 505 
Annihilator, 484 
Antipodal point, 277 
Arithmetic: 

Fundamental Theorem of, 390 
modular, 64 
Arrow, 586 
wiggly，586 

Ascending chain condition, 393, 467 
Associate elements, 392 
Associative law ， 5, 39 
Automorphism, 176 
of a field, 539 
of a group，50 
Averaging over a group, 311 
Axiomatic characterization of determinant, 23 
Axioms, of choice, 101 ， 374, 588 
Axioms ， Peano, 348 

Baker, 416 
Ball ， open, 593 
Basis: 

change of, 98 
of a module, 454 
orthogonal, 244 
orthonormal, 126, 241 
standard, 26, 90, 454 
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Basis: {continued ) 
symplectic, 261 
theorem, 469 
transcendence, 525 
of a vector space, 90 
Bezout bound, 376 
Bijection, 586 
Bijective map, 586 
Bilateral symmetry, 155 
Bilinear form, 238 
Binomial coefficient, 589 
Biquadratic extension, 539 
Block ， Jordan，480 
Block multiplication, 8 
Bound, upper, 588 
Bounded set, 595 
Bracket, Lie ， 290, 291 
Branched covering, 378, 520 
isomorphism of, 519 
Branch points, 521 
Bruhat decomposition, 236 
Bundle, vector，483 
Burnside’s Formula，196 

Cancellation Law ， 42, 84, 369 
for ideals, 422 

Canonical form, rational, 479 
Cantor, 587 

Cardano’s Formula，544 
Cardinality of a set, 586 
Case analysis，589 
Cauchy-Riemann equations, 598 
Cayley-Hamilton Theorem, 153, 488 
Cayley’s Theorem, 197 
Cayley transform, 306 
Center: 

of gravity, 163 
of a group, 52 
Centralizer, 198 
Centrally symmetric set, 426 
Chain condition, ascending s 393, 467 
Change of basis，98 
matrix of, 98 
Character, 316 
abelian, 325 
dimension of, 317 
irreducible，316 
Character group, 325 
Characteristic; 
of a field, 86 
of a ring，358 

Characteristic polynomial, 122 
Characteristic subgroup，234 
Characteristic value, 117 
Characteristic vector，117 
Character table，320 
Chinese remainder theorem, 303, 441 
Choice, axiom of, 101， 374 


Circulant, 268 
Class: 

congruence, 56, 64 
conjugacy, 198 
equivalence, 54 
ideal, 417, 425 
isomorphism, 49 
residue, 64 
Class Equation, 198 
Class function, 318 
Class group，426 
Class number ， 417, 426 
Classical group, 270 
Classification of groups, 49, 299 
Closed set, 594 
Closed word, 233 
Closure, algebraic, 527 
Coefficient ， leading, 350 
Column index, 1 
Column vector, 2 
Combination, linear, 87 
Commutative law, 39 
Commutative ring, 346 
Commutator, 222 
Commutator subgroup, 234 
Compact group，313 
Compact set, 595 
Complement, orthogonal, 243 
Complete expansion of the determinant, 28 
Complete induction, 380, 592 
Complete set of relations，464 
Complex algebraic group, 299 
Complex representation, 310 
Component, connected，77 
Composition, law of，39 
Conductor, 387 
Congruence: 
class ， 56, 64 
of integers, 64 
Congruent matrices, 270 
Conic，255 
Conjugacy class, 198 
Conjugate element, 51 
Conjugate linearity, 250 
Conjugate representation, 309 
Conjugate subfield, 558 
Conjugate subgroup, 180 
Conjugation ， 50， 198 
Connected component, 77 
Connected set, 595 
Connected, simply，278 
Constructible point, line, circle，500 
Constructive real number, 502 
Construction, ruler and compass，500 
Content, 399 

Continuous function, map，595 
Continuous representation, 313 
Contradiction, proof by, 592 
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Convex set，427 
Coordinates, 94 
Coordinate vector ， 94, 455 
Correspondence theorem ， 75, 360, 452 
Coset，57 
double, 77 
left, 57 
right, 59 

Coset multiplication, 68 
Coset space, 178 
Counting Formula ， 58， 180 
Covering: 

branched, 378, 520 
Cramer’s Rule, 31 
Crystallographic group, 172， 187 
Crystallographic restriction，169 
Crystal system, 187 
Cubic ， resolvent，564 
Cubic equation，543 
Cubic extension, 497 
Curve ， algebraic, 376 
Cut and paste，520 
Cycle; 

decomposition, 213 
notation, 213 

Cyclic group, 46 ， 164, 184 
Cyclic permutation, 25 
Cyclotomic field or extension，567 
Cyclotomic polynomial, 405 

Decomposition, polar, 304 
Defining relations for a group, 221 
Definition, 585 

inductive or recursive, 348 
Degree: 

of an algebraic curve, 387 
of an element, 497 
of a field extension, 497 
of a polynomial, 350 
of a rational function, 535 
transcendence» 526 
weighted, 550 

Dependence, linear, 88, 101 
Determinant, 20, 453 

axiomatic characterization, 23 
complete expansion of, 28 
of an operator, 123 
Vandermonde, 36 
Diagonal entries of a matrix, 6 
Diagonalization ， 130, 458 
Diagonal matrix, 6 
Dichotomy, 589 
Differential equation, 135 
Dihedral group, 164, 184 
Dimension: 

of a character, 317 
of a linear group, 293 
of a manifold, 596 


of a representation, 308 
of a vector space, 93 
Dimension formula, 110 
Diophantine equation ， 410, 437 
Direct sum: 

of representations, 315 
of submodules，471 
of subspaces，102 

Discrete group of motions, 166， 167 
Discriminant, 548 
of a cubic，546 

of a quadratic number field, 413 
Distance between vectors, 125 
Distinct elements, 585 
Distributive law, 5 
Divide and conquer, 589 
Divisor, 392 

greatest common ， 46, 395 
proper, 392 
zero, 368 
Domain: 

Euclidean，397 
fundamental, 195 
integral, 368 
of a map，585 
principal ideal，396 
unique factorization, 394 
Dot product, 125, 237 
Double coset, 77 
Double covering, 277 

Echelon matrix, 14 
Eigenvalue, 117 
Eigenvector, 117 
Eisenstein Criterion, 404 
Element: 

algebraic, 493 
associate，392 
conjugate，51 

of a field extension, primitive，552 

ideal, 356 

idempotent, 382 

identity，41 

image of, 585 

infinitesimal, 365 

invertible, 42 

irreducible, 392 

of a lattice, primitive, 172 

maximal, 588 

nilpotent，365 

norm of ， 414 

order of, 47 

prime, 395 

representative, 55 

transcendental, 493 

unipotent，381 

unit，347 
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Elementary column operation，18 
Elementary matrix, 11 
Elementary row operation, 12 
Elementary symmetric function, 547 
Elements: 
distinct, 585 
independent，454 
Elimination, Gaussian，12 
Ellipsoid, 258 
Entries: 

diagonal, 6 
of a matrix, 1 
Equation: 
class, 198 
Diophantine, 437 
homogeneous, 16 
linear，4 
quartic, 560 
quin tic, 570 

Equations, Cauchy-Riemann y 598 
Equivalence class，53 
Equivalence relation, 53 
determined by a map, 55 
Eratosthenes，sieve of，403 
Euclidean domain, 397 
Euclidean space, 247 
Euler, 410 

Evaluation of polynomials, 353 
Even permutation, 26 
Exceptional group，299 
Existence of factorizations, 393 
Existence theorem ， Riemann, 519 
Expansion by minors, 20 
Exponential of a matrix，138 
Expressible by radicals，571 
Extension: 

algebraic，500 
biquadratic, 539 
cubic, 497 
cyclotomic，567 
Galois, 540 
Kummer, 566 
pure transcendental, 525 
quadratic, 497 
ring，364 

transcendental, 525 
Extension field，492 
External law of composition, 81 

Factorization: 

existence of, 393 
irreducible, 395 
prime, 395 
Faithful module, 491 
Faithful operation, 183 
Faithful representation, 308 
Fallings, 437 
Fermat Equation, 409 


Fermat's last theorem, 437 
Fermat’s Theorem, 105 
Fibonacci numbers, 154 
Fibration ， Hopf, 280 
Fibre of a map, 55 
Field, 83 

algebraically closed, 527 
automorphism of，539 
characteristic of ， 86 
cydotomic，567 
finite, 492, 509 
fixed ， 540 
function, 493, 516 
intermediate, 542 
number, 492 
order of, 509 
prime, 83 
splitting，540 
Field extension, 492 
degree of ， 497 
finite, 497 
generators of, 495 

Field extensions, isomorphism of, 496 

Field of fractions, 369 

Finite-dimensional vector space，91 

Finite extension, 497 

Finite field ， 492, 509 

Finite linear combination, 100 

Finitely generated module, 454 

Finite set, 586 

Finite simple group，299 

First Isomorphism Theorem ， 68, 360, 452 

Fixed field，540 

Fixed point，162 

Fixed Point Theorem, 162, 199 

Form: 

bilinear, 238 
Hermitian, 250 
indefinite，243 
invariant, 311 
Jordan, 480 
Killing，304 
Lorentz, 243 
matrix of, 239 
nondegenerate, 244 
null space of, 244 
positive definite ， 241， 252 
quadratic, 256 
restriction of，248 
signature of，245 
skew-symmetric, 238, 260 
symmetric, 238 
Formal linear combination，94 
Four group，48 
Fraction, 369 
Fraction field，369 
Fractions ， partial, 441 
Free abelian group, 223 
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Free group，219 

mapping property of, 220 
Free module，454 
Free semigroup, 217 
Frobenius norm, 153 
Frobenius reciprocity, 343 
Function, 586 
class，318 
continuous, 594 
inverse，586 
multi-valued, 519 
partially symmetric, 561 
rational, 369, 516 
single-valued，519 
size, 397 
successor, 348 
symmetric, 547 
Function field, 493, 516 
Fundamental domain，195 
Fundamental Theorem: 
of Algebra，527 
of Arithmetic, 390 

Galois, 570 

Galois extension, 540 

Galois group, 539, 558 

Galois theory，main theorem of. 542 

Gaussian elimination，12 

Gauss integers, 345 

Gauss prime，406 

Gauss’s Lemma, 400 

General linear group, 43, 453 

Generators: 

of a field extension, 495 
of a group，220 
of a module, 454 
of a subgroup, 48 
Genus，534 
G-invariant form，311 
G-invariant subspace, 314 
G-invariant transformation, 325 
Glide reflection, 157 
Glide symmetry, 156 
Gram-Schmidt procedure, 241 
Gravity，center of, 163 
Greatest common divisor, 46 . 395 
Group，42 
abelian, 42 
affine，306 
algebraic, 289, 299 
alternating, 52 
automorphism of, 50 
center of, 52 
character，325 
class, 426 
classical, 270 
compact, 313 
complex algebraic, 299 


crystallographic, 172， 187 
cyclic, 46 ， 164, 184 
dihedral, 164, 184 
discrete, 166， 167 
exceptional，299 
free，219 
free abelian，222 
Galois ， 539, 558 
general linear, 43, 453 
generators of, 220 
icosahedral, 184 
ideal class, 429 
infinite cyclic，46 
lattice, 172 
of Lie type, 300 
linear, 270 
Lorentz, 271 
Matthieu，300 
of motions, 127 
octahedral, 184 
order of ， 47 
orthogonal, 124, 271 
point, 168 
product, 61 
projective，296 
quaternion, 48 
quotient，67 
real algebraic, 289 
relations in, 220 
rotation，125 
simple, 201， 299 
special linear, 271 
special orthogonal, 124, 271 
special unitary, 271 
spin, 278 
sporadic, 300 
symmetric, 43 
of symmetries, 156 
symplectic, 271 
tetrahedral, 184 
translation, 167 
translation in, 292 
triangle, 235 
unitary, 252, 271 

Group homomor|rfiism，51 
kernel of, 51 

Group operation, 176, 309 

Group representation, 308 

Groups: 

abelian, Structure Theorem, 472 
classification of, 49 
homomorphism of, 51 
isomorjrfiism of，49 

Haar measure, 314 

Half integer, 413 

Half lattice point, 417 

Hermitian form, 250 
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Hermitian matrix, 251 
Hermitian operator, 253 
Hermitian product，250 
Hermitian symmetry, 250 
Hilbert Basis Theorem，469 
Hilbert Nullstellensatz，371 
Homeomorphism, 595 
Homogeneity，292 
Homogeneous equation，16 
Homomorphism: 
of groups, 51 
image of, 51 
of modules, 451 
of rings, 353 
Hopf fibration ， 276, 280 
Hyperboloid, 258 
Hypervector，96 

Icosahedral group, 184 
Ideal, 356 

generated by a set, 357 
maximal, 370 
norm of, 425 
prime, 420 
principal, 357 
product, 419 
proper, 357 
unit，357 
zero, 357 

Ideal class, 417, 425 

Ideal class group, 429 

Ideal element，356 

Ideals, cancellation law for, 422 

Idempotent element, 382 

Identities，permanence of，456 

Identity，456 

Identity element，41 

Identity matrix, 6 

Image: 

of an element，586 
of a homomorphism, 51 
inverse，586 
of a map, 586 
Imaginary part, 137 
Inclusion，ordering by，588 
Inclusion map, 51 
Indefinite form，243 
Independent elements，454 
Independent ， linearly ， 88， 101 
Independent submodules, 472 
Independent subspaces，102 
Index : 

column, 1 
multi, 352 
row, 1 

of a subgroup, 57 
Indices, 25 

Induced law of composition, 44 


Induced representation, 343 
Induction，590 

complete ， 380, 592 
Induction axiom, 348 
Inductive definition, 348 
Inequality: 

Schwarz, 248 
triangle，248 
Infinite cyclic group, 46 
Infinite dimensional space, 100 
Infinitesimal element, 287, 365 
Infinitesimal tangent, 288 
Initial conditions，137 
Injection, 586 

Injective function, map, 586 
Integer: 

algebraic，410 
half, 413 
square-free, 411 
Integers; 

congruence of ， 64 
Gauss, 345 
ring of, 348, 413 
Integral domain，368 
Intermediate field，542 
Interpolation, Lagrange, 444 
Intersection: 

multiplicity of, 387 
of subgroups, 60 
of subsets, 602 
Invariant form, 311 
Invariant subspace, 116, 314 
Inverse, 42 
left，7 
right, 7 

Inverse function, 586 

Inverse image, 55, 586 

Inverse matrix, 7 

Invertible element，42 

Invertible matrix, 6 

Irreducible algebraic curve, 387 

Irreducible character, 316 

Irreducible element, 392 

Irreducible factorization，395 

Irreducible polynomial, 390 

Irreducible polynomial for an element，494 

Irreducible representation, 315 

Isometry，156 

Isomorphic field extensions, 496 
Isomorphism: 

of branched coverings, 519 
class，49 

of field extensions ， 496 
of groups, 49 
of modules, 451 
of representations, 316 
of rings，353 
of vector spaces, 87 
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Jacobi identity, 291 
Jordan block, 480 
Jordan form, 480 

Kaleidoscope, 166 
Kernel: 

of a group homomorjAism, 52 
of a linear transformation, 110 
of a module homomorphism, 451 
of a ring homomorphism，356 
Killing form, 304 
Klein four group，48 
Kronecker, 403, 570 
Kummer extension, 566 

Lagrange, 560 
Lagrange interpolation, 444 
Lagrange’s Theorem, 58 
Latitude, 274 
Lattice，168 
Lattice group, 172 
Lattice point, half，417 
Lattices, similar, 397, 425 
Laurent polynomials, 367 
Law of composition，39 
external, 80 
induced, 44 

Leading coefficient，350 
Left coset, 57 
Left inverse, 7 
Left multiplication, 9, 176 
Left operation, 176 
Left translation, 292 
Length of a vector, 125, 247 
Lie algebra, 291 
Lie bracket, 290 
Lie type，group of, 299 
LiJie, 401 
tangent, 387 

Linear combination, 10, 87 
finite, 100 
formal, 94 
Linear equation，8 
Linear group, 270 
dimension of, 293 
Linearity, conjugate，250 
Linearly dependent, 88, 101 
Linearly independent, 88， 101 
Linear operator, 270 
Linear relation, 88 
Linear transformation, 109 
kernel of，110 
matrix of，112 
restriction of，116 
Localization of a ring，385 
Longitude, 274 
Lorentz form, 243 
Lorentz group, 271 


Lorentz transformation, 271 
Liiroth’s Theorem，555 

Main Lemma，422 

Main theorem of Galois theory，542 

Manifold, 596 

Map: 

bijective，586 
continuous，595 
domain of，585 
fibre of, 55 
image of, 585 
inclusion, 51 
injective，586 
range of，585 
surjective, 586 
zero, 353 
Mapping property: 

of the free group，220 
of products, 62 
of quotient groups, 221 
of quotient modules, 452 
of quotient rings，360 
Maschke^s Theorem, 316 
Matrices: 

congruent, 270 
similar, 116 
Matrix，1 

adjoint ， 29, 251 
of change of basis ， 98 
diagonal, 6 
elementary，11 
exponential of, 138 

of a form，239 
Hermitian，251 

identity，6 
inverse, 7 
invertible, 6 

of a linear transformation, 112 
nilpotent, 32 
normal, 259 
orthogonal, 124 
permutation, 25 
positive, 119 

positive definite ， 241， 252 
presentation, 465 
row echelon，14 
scalar, 27 

skew-symmetric, 260 
symmetric, 238 
trace of, 98 
transpose, 18 
triangular, 6 
unitary，252 
upper triangular, 6 
zero, 6 

Matrix addition, 2 
Matrix entries，1 
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Matrix multiplication，3 
Matrix representation, 308 
Matrix unit，11 
Matthieu group, 300 
Maximal element，588 
Maximal ideal, 370 
Measure, Haar, 313 
Minimal polynomial, 489 
Minkowski’s Lemma，427 
Minors, 153, 484-5, 491 
Minors, expansion by, 20 
Modular arithmetic, 64 
Module, 450 
basis of, 454 
faithful, 491 
finitely generated, 454 
free, 454 

generators of, 454 
presentation of，465 
rank of, 455 
relations in，464 
simple, 484 
Modules ： 

direct sum of ， 471 
homomorphism of, 451 
isomorphism of，451 
product of, 474 
Structure Theorem for, 475 
Monic polynomial, 350 
Monomial，350 
Monster, 300 
Motion: 

orientation-preserving, reversing ， 128， 157 
rigid, 127, 156 
Motions，group of ， 127 
Multi-index, 352 
Multiple root, 377, 508 
Multiplication: 
coset，68 
left, 9, 176 
matrix, 3 
right, 18 
scalar ， 2, 78, 86 
Multiplication table, 40 
Multiplicative set, 384 
Multiplicity of intersection, 387 
Multi-valued function, 518 


Nakayama Lemma，491 
Natural numbers, 348 
Negative definite, 264 
Neighborhood, 594 
Nilpotent element, 365 
Nilpotent matrix, 32 
Nilpotent operator, 146 
Nilradical, 381 


Noetherian ring, 468 
Noncommutative ring, 345 
Nondegenerate form，244 
Nonsingular operator, 121 
Nonsingular point, 387 
Norm: 

of an element，414 
Frobenius, 153 
of an ideal, 425 
Normalizer, 204 
Normal matrix or operator, 259 
Nullity，110 

Null space of a form，244 
Nullstellensatz, 371 
Null vector, 244 
Number: 

algebraic, 345 
class, 417, 426 
Fibonacci，154 
transcendental，345 
Number field ， 450 
quadratic ， 411 
Numbers, natural, 348 


Octahedral group, 184 
Odd permutation, 26 
One-parameter subgroup, 283 
Open ball, 593 
Open set，594 
Operation: 

elementary, 18 
feithfiil, 183 
of a group, 176, 309 
left, 176 
partial, 227 
restriction of, 180 
transitive, 177 
Operator, 115 

determinant of, 123 
Hermitian, 253 
linear, 270 
nilpotent, 146 
nonsingular, 121 
normal, 259 
orthogonal, 126, 255 
row, 12 
shift, 120, 477 
singular, 121 
symmetric, 255 
trace of, 123 
unipotent, 153 
unitary, 253 
Orbit，177 
Order: 

of an element, 47 
of a finite field, 509 
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of a group，47 
by inclusion, 588 
partial，588 
of a set，587 
total, 588 

Ordered set, 87, 588 

Orientation-preserving or reversing motion ， 128, 

Orthogonal basis, 244 

Orthogonal complement, 243 

Orthogonal group, 124, 270 

Orthogonality relations, 318 

Orthogonal matrix, 124 

Orthogonal operator, 126, 255 

Orthogonal projection, 249 

Orthogonal representation of SU 2 , 276 

Orthogonal vectors, 126, 241， 252 

Orthonormal basis ， 126, 241， 252 

尸 -group，199 
Paraboloid，258 
Partial fractions, 441 
Partially symmetric function, 561 
Partial operation, 227 
Partial ordering, 588 
Partition, 53 
Pfeith, 77 

Path-connected, 77 
Peano’s axioms, 348 
Permanence of identities, 456 
Permutation, 25, 43, 211, 586 
cyclic, 25 
even，26 
odd, 26 
sign of, 26 

Permutation matrix, 25 
Permutation representation, 182, 322 
Pick’s Theorem, 490 
Pidgeonhole principle, 587 
Pivot, 14 

Plane，translation in, 157 
Point ， fixed, 162 
nonsingular, 387 
singular, 387, 405 
Point group, 168 
Polar decomposition, 304 
Pole, 373 
Polynomial, 350 
characteristic, 121 
cyclotomic, 405 
degree of, 350 
evaluation of, 353 
irreducible ， 390, 494 
Laurent, 367 
minimal, 489 
monic, 350 
primitive, 399 
residue of, 354 
Positive definite, 241, 252 


Positive matrix, 119 
Presentation matrix, 465 
Presentation of a module, 465 
Prime: 

Gauss，406 
ramified, 425 
157 split, 425 

Prime element, 395 

Prime factorization, 395 

Prime field，83 

Prime ideal, 385, 420 

Primitive element of a field extension，552 

Primitive element of a lattice，172 

Primitive polynomial, 399 

Principal ideal, 357 

Principal ideal domain, 396 

Principle, Substitution, 353 

Product: 

mapping property of, 62 
of modules，474 
of subsets of a group, 66 
Product group, 61 
Product ideal, 419 
Product ring, 380 
Product set, 602 
Projection, 61 
orthogonal, 249 
Projective group, 296 
Projective space, 277 
Proper divisor, 392 
Proper ideal, 357 
Proper subgroup, 45 
Proper subspace，87 
Pure transcendental extension, 525 
Pythagoras ， Theorem ， 125, 503 


Quadratic extension，497 
Quadratic form，256 
Quadratic number field, 411 
discriminant of，413 
Quadratic reciprocity, 440 
Quadric, 256 
Quartic equation, 560 
Quaternion group, 48 
Quaternions, 306 
Quillen，482 
Quintic equation, 570 
Quotient group，67 

mapping property of, 221 
Quotient module, 452 

mapping property of, 452 
Quotient ring，359 

mapping property of，360 

Radicals, 571 
Ramified prime, 425 
Range of a map，585 
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Rank ， 111 

of a free module, 455 
Rational canonical form，479 
Rational function, 370, 516 
degree of，535 
Ray, 280 

Real algebraic group，289 

Real algebraic set, 286 

Real number, constructible，502 

Real part，517 

Real subfield，568 

Reciprocity: 

Frobenius，343 
quadratic，440 
Recursive definition, 348 
Reduced word, 217 
Reducible representation, 315 
Reduction, row, 12 
Reflection, 157 
glide，157 

Reflexive relation, 53 
Regular representation, 323 
Relation: 

equivalence, 53 
linear，88 
reflexive, 53 
symmetric, 53 
transitive，53 
Relations: 

complete set, 464 
in a group，220 
in a module，464 
orthogonality, 318 
in a ring, 361 
Relation vector, 464 
Representation, 308 
adjoint，304 
complex, 310 
conjugate，330 
continuous, 313 
dimension of, 308 
faithful, 308 
of a group，308 
induced, 343 
irreducible, 315 
matrix, 308 
permutation, 182, 322 
reducible，315 
regular，322 
sign, 320 

of SU 2y orthogonal，276 
unitary，311 
Representations: 
direct sum of, 315 
isomorphism of，316 
Representative element, 55 
Residue class, 64 
Residue of a polynomial, 354 


Resolvent cubic, 564 
Restriction: 

crystallographic, 169 
of a form, 248 

of a linear transformation, 116 
of an operation, 181 
to a subgroup, 60 
Riemann existence theorem, 519 
Riemarai surface, 376, 518 
Right coset，59 
Right inverse, 7 
Right multiplication，18 
Rigid motion, 127， 156 
Ring，346 

characteristic of，358 
of integers, 348, 413 
localization of，385 
noetherian, 468 
noncommutative, 346 
quotient, 359 
relations in，361 
zero, 347 

Ring homomorphism, 353 
kernel of, 356 
Rings: 

extension of, 364 
homomorphism of, 353 
isomorphism of，353 
product of, 380 
Root: 

multiple，508 
of unity, 512 
Rotation ， 124, 157 
Rotational symmetry, 156 
Rotation group，125 
Row echelon matrix ， 14 
Row index, 1 
Row operator, 12 
Row reduction, 12 
Row vector，2 

Ruler and compass construction, 500 


Scalar，2 
Scalar matrix, 52 
Scalar multiplication ， 2, 78, 86 
Schur’s Lemma, 326, 331， 484 
Schwarz Inequality, 248 
Second Isomorphism Theorem ， 236, 484 
Self-adjoint ? 251 
Semidefinite，263 
Semigroup, 77 
free, 217 
Set: 

bounded，595 
cardinality of，586 
centrally symmetric, 427 
closed, 594 
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compact，595 
convex, 427 
finite, 586 
multiplicative，384 
open ， 593—94 
ordered, 87 
order of, 587 
real algebraic, 286 
Sheets，520 

Shift operator, 120 , 477 
Sieve，403 

Signature of a form，245 
Sign of a permutation, 26 
Sign representation, 320 
Similar lattice, 398, 425 
Similar matrices, 116 
Simple group ， 201 ， 295 
finite，299 
Simple module，484 
Simply connected, 278 
Sin^e-valued function, 518 
Singular operator, 121 
Singular point ， 387, 405 
Size function, 397 
Skew-symmetric form, 238 ， 

260 

Skew-symmetric matrix，260 
Space: 

Euclidean, 247 
projective, 277 
vector，86 
Span ， 88， 100 
Special linear group, 271 
Special orthogonal group, 124, 271 
Special unitary group, 271 
Spectral Theorem，253 
Sphere, 273 
Spin，277 
Spin group, 277 
Split prime, 425 
Splitting field, 540 
Sporadic group, 300 
Square-free integer，411 
Stabilizer, 177 
Standard basis ， 26, 90, 454 
symplectic，261 

Standard Hermitian product, 250 
Stark，416 
Structure Theorem: 

for abelian groups, 472 
for modules, 475 
Subfield，82 
conjugate，559 
real，568 
Subgroup，44 

characteristic, 234 
commutator, 234 
conjugate, 179 


generators of, 48 
index of，57 
normal, 52 
one-parameter, 283 
proper, 45 
restriction to, 60 
Sylow, 206 
transitive, 560 
Submodule, 451 
Submodules: 

direct sum of, 471 
independent, 472 
Subring, 345 
Subset, 602 
proper, 602 
Subspace，79 

G-invariant, 314 
proper，87 

T-invariant ， 116, 314 
Subspaces; 

direct sum of，102 
independent, 102 
sum of, 102 

Substitution Principle，353 
Successor function, 348 
Sum of subspaces, 102 
Surface, Riemann, 376, 518 
Surjection, 586 
Surjective map，586 
Suslin，482 
Sylow subgroup，206 
Sylow Theorem，205 
Sylvester's Law，245 
Symbolic adjunction, 506 
Symmetric form, 238 
Symmetric function, 547 
elementary，547 
Symmetric group, 43 
Symmetric, matrix, 238 
Symmetric operator, 255 
Symmetric relation, 53 
Symmetries, group of, 156 
Symmetry, 156， 176 
bilateral, 155 
glide, 156 
Hermitian，250 
rotational, 156 
translational, 156 
Symplectic basis，261 
Symplectic group，271 

Table: 

character，320 
multiplication, 40 
Tangent, infinitesimal，288 
Tangent line, 387 
Tangent vector, 286 
Tangent vector field，295 
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Tartaglia，543 

Tetrahedral group, 184 

Third Isomorphism Theorem, 236, 360, 484 

Todd-Coxeter Algorithm, 223 

Torus, 524 

Total ordering, 588 

Trace of a matrix or an operator, 123 

Transcendence basis，525 

Transcendence degree，526 

Transcendental element，493 

Transcendental extension, 525 

Thinscendental number, 346 

Transform ， Cayley，306 

Transformation: 

G -invariant, 325 
linear, 109 
Lorentz, 271 
Transitive operation, 177 
Transitive relation，53 
Transitive subgroup, 560 
Translation, 128， 157 
in a group, 292 
left，292 
in the plane, 157 
Translational symmetry，156 
Translation group, 167 
Transpose matrix，18 
Transposition, 25, 212 
Triangle group, 235 
Triangle Inequality, 248 
Triangular matrix, 6 
Trisection of an angle，505 
Trivial solution，16 

Union of subsets, 602 
Unipotent element, 381 
Unipotent operator, 153 
Unique factorization domain，394 
Unit, 347 
matrix, 10 

Unitary group, 252, 271 
Unitary matrix, 252 
Unitary operator, 253 
Unitary representation, 311 
Unit element, 347 
Unit ideal，357 
Unit vector, 124 
Unity, root of ， 512 


Upper bound, 588 
Upper triangular matrix, 6 

Vandermonde determinant, 36 
Variety, algebraic, 373 
Vector, 78, 450 
characteristic，117 
column, 2 
coordinate ， 94, 455 
length of ， 125, 247 
null，244 
relation, 464 
row, 2 
tangent, 286 
unit, 124 

Vector addition ， 78, 86 
Vector bundle，483 
Vector field, tangent，295 
Vectors: 

angle between, 126, 248 
distance between, 125 
orthogonal， 126 , 241 
Vector space, 86 
basis of, 90 
dimension of，93 
finite-dimensional, 91 
infinite-dimensional，100 
Vector spaces: 

direct sum of, 102 
isomorphism of ， 87 

Weight, 550 
Weighted degree，549 
Wiggly arrow，586 
Wilson’s Theorem, 105 
Word，217 
closed, 233 
reduced, 217 
Word problem，223 

Zero divisor, 368 
Zero ideal，357 
Zero map，353 
Zero matrix, 6 
Zero ring，347 
Zorn’s Lemma, 588 



